1. Reasons
1.1. outliers
1.2. Different units of measurements (after merging)
1.3. Different ranges
2. Process
2.1. Building Coefficient
2.1.1. density
2.1.2. ratio
2.1.3. proportion
2.1.4. Average
2.2. ReScaling
2.2.1. standardization
2.2.1.1. mean of 0 and a standard deviation of 1 for all variables (all adimensional)
2.2.1.2. sensitive to outliers
2.2.2. normalization (minmax)
2.2.2.1. scales numerical features to a specific range, usually between 0 and 1
2.2.2.2. sensitive to outliers
2.2.3. robust techniques
2.2.3.1. Like standardizing, but uses the median instead of mean, and interquartile range (IQR) instead of standard deviation
2.2.3.1.1. good in the presence of outliers
2.2.3.2. does not work if quartiles 1 and 3 are the same.
2.3. Discretizing in K groups
2.3.1. frequency
2.3.1.1. K bins with same count
2.3.1.1.1. uniform spread
2.3.2. uniform
2.3.2.1. K bins with same length
2.3.2.1.1. keep original distribution
2.3.3. kmeans
2.3.3.1. K bins with cases binned by closeness to k centroids
2.3.3.1.1. benefits "similarity" (natural grouping)