@@ -165,7 +165,8 @@ amount of time (e.g., on large datasets).
165165
166166 * :ref: `example_ensemble_plot_forest_iris.py `
167167 * :ref: `example_ensemble_plot_forest_importances_faces.py `
168- * :ref: `example_ensemble_plot_forest_multioutput.py `
168+ * :ref: `example_ensemble_plot_forest_multioutput.py `
169+
169170
170171.. topic :: References
171172
@@ -177,9 +178,6 @@ amount of time (e.g., on large datasets).
177178 trees", Machine Learning, 63(1), 3-42, 2006.
178179
179180
180- .. _gradient_boosting :
181-
182-
183181 Feature importance evaluation
184182-----------------------------
185183
@@ -219,6 +217,8 @@ the matching feature to the prediction function.
219217 * :ref: `example_ensemble_plot_forest_importances.py `
220218
221219
220+ .. _gradient_boosting :
221+
222222Gradient Tree Boosting
223223======================
224224
@@ -284,11 +284,10 @@ that controls overfitting via :ref:`shrinkage <gradient_boosting_shrinkage>`.
284284Regression
285285----------
286286
287- :class: `GradientBoostingRegressor ` supports a number of different loss
288- functions for regression which can be specified via the argument
289- ``loss ``. Currently, supported are least squares (``loss='ls' ``) and
290- least absolute deviation (``loss='lad' ``), which is more robust w.r.t.
291- outliers. See [F2001 ]_ for detailed information.
287+ :class: `GradientBoostingRegressor ` supports a number of
288+ :ref: `different loss functions <gradient_boosting_loss >`
289+ for regression which can be specified via the argument
290+ ``loss `` which defaults to least squares (``'ls' ``).
292291
293292::
294293
@@ -378,6 +377,7 @@ Where the step length :math:`\gamma_m` is choosen using line search:
378377 The algorithms for regression and classification
379378only differ in the concrete loss function used.
380379
380+ .. _gradient_boosting_loss :
381381
382382Loss Functions
383383...............
@@ -393,6 +393,13 @@ the parameter ``loss``:
393393 * Least absolute deviation (``'lad' ``): A robust loss function for
394394 regression. The initial model is given by the median of the
395395 target values.
396+ * Huber (``'huber' ``): Another robust loss function that combines
397+ least squares and least absolute deviation; use ``alpha `` to
398+ control the sensitivity w.r.t. outliers (see [F2001 ]_ for more
399+ details).
400+ * Quantile (``'quantile' ``): A loss function for quantile regression.
401+ Use ``0 < alpha < 1 `` to specify the quantile. This loss function
402+ can be used to create prediction intervals.
396403
397404 * Classification
398405
@@ -443,8 +450,7 @@ Subsampling
443450[F1999 ]_ proposed stochastic gradient boosting, which combines gradient
444451boosting with bootstrap averaging (bagging). At each iteration
445452the base classifier is trained on a fraction ``subsample `` of
446- the available training data.
447- The subsample is drawn without replacement.
453+ the available training data. The subsample is drawn without replacement.
448454A typical value of ``subsample `` is 0.5.
449455
450456The figure below illustrates the effect of shrinkage and subsampling
@@ -458,12 +464,21 @@ does poorly.
458464 :align: center
459465 :scale: 75
460466
467+ For ``subsample < 1 ``, the deviance on the out-of-bag samples in the i-the iteration
468+ is stored in the attribute ``oob_score_[i] ``. Out-of-bag estimates can be
469+ used for model selection (e.g. to determine the optimal number of iterations).
470+
471+ Another strategy to reduce the variance is by subsampling the features
472+ analogous to the random splits in Random Forests. The size of the subsample
473+ can be controled via the ``max_features `` parameter.
474+
461475
462476.. topic :: Examples:
463477
464478 * :ref: `example_ensemble_plot_gradient_boosting_regression.py `
465479 * :ref: `example_ensemble_plot_gradient_boosting_regularization.py `
466480
481+
467482.. topic :: References
468483
469484 .. [F2001 ] J. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine",
0 commit comments