Machine Learning– Transform, Manage, and Prepare Data

There are many techniques to consider when you want to better format data values for machine learning—or learning in general. Having data optimally organized increases the machine learning algorithm’s ability to efficiently predict and train the data model. Two techniques, normalization and denormalizaton, are discussed here in more detail. Fundamentally, the concept of the normalization of data value has to do with changing the scale of the data without distorting the differences in ranges.

Normalization

Normalization is a technique that is often applied when preparing data for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information. Consider a query that performs some exploratory analysis using aggregate, analytic, and mathematical T‐SQL functions of brain wave reading values by scenarios and frequency. The result of such a query may be something similar to the following.

Notice the significant range of numbers in the output, where the smallest number is 0.806404 and the largest is 1522.496. If you attempted to visualize such a set of numbers, as shown in Figure 5.29, it might not be insightful.

FIGURE 5.29 Not normalized brain waves data

Perform Exercise 5.10 to normalize that data so that it renders more value when illustrated.

Raymond Gallardo

Learn More →

Leave a Reply

Your email address will not be published. Required fields are marked *