2929other in the way to estimate the parameters used to shift and scale each
3030feature.
3131
32- ``QuantileTransformer`` provides a non-linear transformation in which distances
33- between marginal outliers and inliers are shrunk.
32+ ``QuantileTransformer`` provides non-linear transformations in which distances
33+ between marginal outliers and inliers are shrunk. ``PowerTransformer`` provides
34+ non-linear transformations in which data is mapped to a normal distribution to
35+ stabilize variance and minimize skewness.
3436
3537Unlike the previous transformations, normalization refers to a per sample
3638transformation instead of a per feature transformation.
5961from sklearn .preprocessing import StandardScaler
6062from sklearn .preprocessing import RobustScaler
6163from sklearn .preprocessing import Normalizer
62- from sklearn .preprocessing .data import QuantileTransformer
64+ from sklearn .preprocessing import QuantileTransformer
65+ from sklearn .preprocessing import PowerTransformer
6366
6467from sklearn .datasets import fetch_california_housing
6568
8487 MaxAbsScaler ().fit_transform (X )),
8588 ('Data after robust scaling' ,
8689 RobustScaler (quantile_range = (25 , 75 )).fit_transform (X )),
87- ('Data after quantile transformation (uniform pdf)' ,
88- QuantileTransformer (output_distribution = 'uniform' )
89- .fit_transform (X )),
90+ ('Data after power transformation (Box-Cox)' ,
91+ PowerTransformer (method = 'box-cox' ).fit_transform (X )),
9092 ('Data after quantile transformation (gaussian pdf)' ,
9193 QuantileTransformer (output_distribution = 'normal' )
9294 .fit_transform (X )),
95+ ('Data after quantile transformation (uniform pdf)' ,
96+ QuantileTransformer (output_distribution = 'uniform' )
97+ .fit_transform (X )),
9398 ('Data after sample-wise L2 normalizing' ,
94- Normalizer ().fit_transform (X ))
99+ Normalizer ().fit_transform (X )),
95100]
96101
97102# scale the output between 0 and 1 for the colorbar
@@ -286,6 +291,35 @@ def make_plot(item_idx):
286291
287292make_plot (4 )
288293
294+ ##############################################################################
295+ # PowerTransformer (Box-Cox)
296+ # --------------------------
297+ #
298+ # ``PowerTransformer`` applies a power transformation to each
299+ # feature to make the data more Gaussian-like. Currently,
300+ # ``PowerTransformer`` implements the Box-Cox transform. It differs from
301+ # QuantileTransformer (Gaussian output) in that it does not map the
302+ # data to a zero-mean, unit-variance Gaussian distribution. Instead, Box-Cox
303+ # finds the optimal scaling factor to stabilize variance and mimimize skewness
304+ # through maximum likelihood estimation. Note that Box-Cox can only be applied
305+ # to positive, non-zero data. Income and number of households happen to be
306+ # strictly positive, but if negative values are present, a constant can be
307+ # added to each feature to shift it into the positive range - this is known as
308+ # the two-parameter Box-Cox transform.
309+
310+ make_plot (5 )
311+
312+ ##############################################################################
313+ # QuantileTransformer (Gaussian output)
314+ # -------------------------------------
315+ #
316+ # ``QuantileTransformer`` has an additional ``output_distribution`` parameter
317+ # allowing to match a Gaussian distribution instead of a uniform distribution.
318+ # Note that this non-parametetric transformer introduces saturation artifacts
319+ # for extreme values.
320+
321+ make_plot (6 )
322+
289323###################################################################
290324# QuantileTransformer (uniform output)
291325# ------------------------------------
@@ -302,18 +336,7 @@ def make_plot(item_idx):
302336# any outlier by setting them to the a priori defined range boundaries (0 and
303337# 1).
304338
305- make_plot (5 )
306-
307- ##############################################################################
308- # QuantileTransformer (Gaussian output)
309- # -------------------------------------
310- #
311- # ``QuantileTransformer`` has an additional ``output_distribution`` parameter
312- # allowing to match a Gaussian distribution instead of a uniform distribution.
313- # Note that this non-parametetric transformer introduces saturation artifacts
314- # for extreme values.
315-
316- make_plot (6 )
339+ make_plot (7 )
317340
318341##############################################################################
319342# Normalizer
@@ -326,5 +349,6 @@ def make_plot(item_idx):
326349# transformed data only lie in the positive quadrant. This would not be the
327350# case if some original features had a mix of positive and negative values.
328351
329- make_plot (7 )
352+ make_plot (8 )
353+
330354plt .show ()
0 commit comments