ihat
diff --git a/‎doc/modules/ensemble.rst‎
Lines changed: 26 additions & 11 deletions b/‎doc/modules/ensemble.rst‎
Lines changed: 26 additions & 11 deletions
diff --git a/‎doc/whats_new.rst‎
Lines changed: 7 additions & 0 deletions b/‎doc/whats_new.rst‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎examples/ensemble/plot_gradient_boosting_quantile.py‎
Lines changed: 79 additions & 0 deletions b/‎examples/ensemble/plot_gradient_boosting_quantile.py‎
Lines changed: 79 additions & 0 deletions
diff --git a/‎examples/ensemble/plot_gradient_boosting_regularization.py‎
Lines changed: 15 additions & 9 deletions b/‎examples/ensemble/plot_gradient_boosting_regularization.py‎
Lines changed: 15 additions & 9 deletions
diff --git a/‎sklearn/ensemble/__init__.py‎
Lines changed: 0 additions & 1 deletion b/‎sklearn/ensemble/__init__.py‎
Lines changed: 0 additions & 1 deletion
@@ -165,7 +165,8 @@ amount of time (e.g., on large datasets).
 
  * :ref:`example_ensemble_plot_forest_iris.py`
  * :ref:`example_ensemble_plot_forest_importances_faces.py`
-  * :ref:`example_ensemble_plot_forest_multioutput.py`
+ * :ref:`example_ensemble_plot_forest_multioutput.py`
+
 
 .. topic:: References
 
@@ -177,9 +178,6 @@ amount of time (e.g., on large datasets).
    trees", Machine Learning, 63(1), 3-42, 2006.
 
 
-.. _gradient_boosting:
-
-
 Feature importance evaluation
 -----------------------------
 
@@ -219,6 +217,8 @@ the matching feature to the prediction function.
  * :ref:`example_ensemble_plot_forest_importances.py`
 
 
+.. _gradient_boosting:
+
 Gradient Tree Boosting
 ======================
 
@@ -284,11 +284,10 @@ that controls overfitting via :ref:`shrinkage <gradient_boosting_shrinkage>`.
 Regression
 ----------
 
-:class:`GradientBoostingRegressor` supports a number of different loss
-functions for regression which can be specified via the argument
-``loss``. Currently, supported are least squares (``loss='ls'``) and
-least absolute deviation (``loss='lad'``), which is more robust w.r.t.
-outliers. See [F2001]_ for detailed information.
+:class:`GradientBoostingRegressor` supports a number of
+:ref:`different loss functions <gradient_boosting_loss>`
+for regression which can be specified via the argument
+``loss`` which defaults to least squares (``'ls'``).
 
 ::
 
@@ -378,6 +377,7 @@ Where the step length :math:`\gamma_m` is choosen using line search:
 The algorithms for regression and classification
 only differ in the concrete loss function used.
 
+.. _gradient_boosting_loss:
 
 Loss Functions
 ...............
@@ -393,6 +393,13 @@ the parameter ``loss``:
     * Least absolute deviation (``'lad'``): A robust loss function for
       regression. The initial model is given by the median of the
       target values.
+    * Huber (``'huber'``): Another robust loss function that combines
+      least squares and least absolute deviation; use ``alpha`` to
+      control the sensitivity w.r.t. outliers (see [F2001]_ for more
+      details).
+    * Quantile (``'quantile'``): A loss function for quantile regression.
+      Use ``0 < alpha < 1`` to specify the quantile. This loss function
+      can be used to create prediction intervals.
 
   * Classification
 
@@ -443,8 +450,7 @@ Subsampling
 [F1999]_ proposed stochastic gradient boosting, which combines gradient
 boosting with bootstrap averaging (bagging). At each iteration
 the base classifier is trained on a fraction ``subsample`` of
-the available training data.
-The subsample is drawn without replacement.
+the available training data. The subsample is drawn without replacement.
 A typical value of ``subsample`` is 0.5.
 
 The figure below illustrates the effect of shrinkage and subsampling
@@ -458,12 +464,21 @@ does poorly.
    :align: center
    :scale: 75
 
+For ``subsample < 1``, the deviance on the out-of-bag samples in the i-the iteration
+is stored in the attribute ``oob_score_[i]``. Out-of-bag estimates can be
+used for model selection (e.g. to determine the optimal number of iterations).
+
+Another strategy to reduce the variance is by subsampling the features
+analogous to the random splits in Random Forests. The size of the subsample
+can be controled via the ``max_features`` parameter.
+
 
 .. topic:: Examples:
 
  * :ref:`example_ensemble_plot_gradient_boosting_regression.py`
  * :ref:`example_ensemble_plot_gradient_boosting_regularization.py`
 
+
 .. topic:: References
 
  .. [F2001] J. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine",
 
@@ -9,6 +9,13 @@
 Changelog
 ---------
 
+   - :class:`ensemble.GradientBoostingRegressor` and
+     :class:`ensemble.GradientBoostingClassifier` now support feature subsampling
+     via the ``max_features`` argument.
+
+   - Added Huber and Quantile loss functions to
+     :class:`ensemble.GradientBoostingRegressor`.
+
    - Added :class:`preprocessing.LabelBinarizer`, a simple utility class to
      normalize labels or transform non-numerical labels, by `Mathieu Blondel`_.
 
 
@@ -0,0 +1,79 @@
+"""
+=====================================================
+Prediction Intervals for Gradient Boosting Regression
+=====================================================
+
+This example shows how quantile regression can be used
+to create prediction intervals.
+"""
+
+import numpy as np
+import pylab as pl
+from sklearn.ensemble import GradientBoostingRegressor
+
+
+np.random.seed(1)
+
+
+def f(x):
+    """The function to predict."""
+    return x * np.sin(x)
+
+#----------------------------------------------------------------------
+#  First the noiseless case
+X = np.atleast_2d(np.random.uniform(0, 10.0, size=100)).T
+X = X.astype(np.float32)
+
+# Observations
+y = f(X).ravel()
+
+dy = 1.5 + 1.0 * np.random.random(y.shape)
+noise = np.random.normal(0, dy)
+y += noise
+y = y.astype(np.float32)
+
+# Mesh the input space for evaluations of the real function, the prediction and
+# its MSE
+xx = np.atleast_2d(np.linspace(0, 10, 1000)).T
+xx = xx.astype(np.float32)
+
+alpha = 0.95
+
+clf = GradientBoostingRegressor(loss='quantile', alpha=alpha,
+                                n_estimators=250, max_depth=3,
+                                learn_rate=.1, min_samples_leaf=9,
+                                min_samples_split=9)
+
+clf.fit(X, y)
+
+# Make the prediction on the meshed x-axis
+y_upper = clf.predict(xx)
+
+clf.set_params(alpha=1.0 - alpha)
+clf.fit(X, y)
+
+# Make the prediction on the meshed x-axis
+y_lower = clf.predict(xx)
+
+clf.set_params(loss='ls')
+clf.fit(X, y)
+
+# Make the prediction on the meshed x-axis
+y_pred = clf.predict(xx)
+
+# Plot the function, the prediction and the 95% confidence interval based on
+# the MSE
+fig = pl.figure()
+pl.plot(xx, f(xx), 'g:', label=u'$f(x) = x\,\sin(x)$')
+pl.plot(X, y, 'b.', markersize=10, label=u'Observations')
+pl.plot(xx, y_pred, 'r-', label=u'Prediction')
+pl.plot(xx, y_upper, 'k-')
+pl.plot(xx, y_lower, 'k-')
+pl.fill(np.concatenate([xx, xx[::-1]]),
+        np.concatenate([y_upper, y_lower[::-1]]),
+        alpha=.5, fc='b', ec='None', label='95% prediction interval')
+pl.xlabel('$x$')
+pl.ylabel('$f(x)$')
+pl.ylim(-10, 20)
+pl.legend(loc='upper left')
+pl.show()
@@ -6,10 +6,15 @@
 Illustration of the effect of different regularization strategies
 for Gradient Boosting. The example is taken from Hastie et al 2009.
 
-The loss function used is binomial deviance. In combination with
-shrinkage, stochastic gradient boosting (Sample 0.5) can produce
-more accurate models.
+The loss function used is binomial deviance. Regularization via
+shrinkage (``learn_rate < 1.0``) improves performance considerably.
+In combination with shrinkage, stochastic gradient boosting
+(``subsample < 1.0``) can produce more accurate models by reducing the
+variance via bagging.
 Subsampling without shrinkage usually does poorly.
+Another strategy to reduce the variance is by subsampling the features
+analogous to the random splits in Random Forests
+(via the ``max_features`` parameter).
 
 .. [1] T. Hastie, R. Tibshirani and J. Friedman, "Elements of Statistical
     Learning Ed. 2", Springer, 2009.
@@ -39,12 +44,14 @@
 
 for label, color, setting in [('No shrinkage', 'orange',
                                {'learn_rate': 1.0, 'subsample': 1.0}),
-                              ('Shrink=0.1', 'turquoise',
+                              ('learn_rate=0.1', 'turquoise',
                                {'learn_rate': 0.1, 'subsample': 1.0}),
-                              ('Sample=0.5', 'blue',
+                              ('subsample=0.5', 'blue',
                                {'learn_rate': 1.0, 'subsample': 0.5}),
-                              ('Shrink=0.1, Sample=0.5', 'gray',
-                               {'learn_rate': 0.1, 'subsample': 0.5})]:
+                              ('learn_rate=0.1, subsample=0.5', 'gray',
+                               {'learn_rate': 0.1, 'subsample': 0.5}),
+                              ('learn_rate=0.1, max_features=2', 'magenta',
+                               {'learn_rate': 0.1, 'max_features': 2})]:
     params = dict(original_params)
     params.update(setting)
 
@@ -57,10 +64,9 @@
     for i, y_pred in enumerate(clf.staged_decision_function(X_test)):
         test_deviance[i] = clf.loss_(y_test, y_pred)
 
-    pl.plot(np.arange(test_deviance.shape[0]) + 1, test_deviance, '-',
+    pl.plot((np.arange(test_deviance.shape[0]) + 1)[::5], test_deviance[::5], '-',
             color=color, label=label)
 
-pl.title('Deviance')
 pl.legend(loc='upper left')
 pl.xlabel('Boosting Iterations')
 pl.ylabel('Test Set Deviance')
 
@@ -8,6 +8,5 @@
 from .forest import RandomForestRegressor
 from .forest import ExtraTreesClassifier
 from .forest import ExtraTreesRegressor
-
 from .gradient_boosting import GradientBoostingClassifier
 from .gradient_boosting import GradientBoostingRegressor