Skip to content

Commit 3c543bf

Browse files
committed
Merge pull request scikit-learn#2829 from arjoly/maxfeatures-seamantics
[MRG] Uniformize max_features semantics for extra trees and random forest
2 parents e20aebe + d7c535f commit 3c543bf

File tree

4 files changed

+5625
-4871
lines changed

4 files changed

+5625
-4871
lines changed

doc/whats_new.rst

Lines changed: 28 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,14 @@ Changelog
2020
:class:`ensemble.BaggingRegressor` meta-estimators for ensembling
2121
any kind of base estimator. See the :ref:`Bagging <bagging>` section of
2222
the user guide for details and examples. By `Gilles Louppe`_.
23-
23+
2424
- Memory improvements of decision trees, by `Arnaud Joly`_.
25-
25+
2626
- Decision trees can now be built in best-first manner by using ``max_leaf_nodes``
2727
as the stopping criteria. Refactored the tree code to use either a
2828
stack or a priority queue for tree building.
2929
By `Peter Prettenhofer`_ and `Gilles Louppe`_.
30-
30+
3131
- Decision trees can now be fitted on fortran- and c-style arrays, and
3232
non-continuous arrays without the need to make a copy.
3333
If the input array has a different dtype than ``np.float32``, a fortran-
@@ -42,24 +42,24 @@ Changelog
4242
- Changed the internal storage of decision trees to use a struct array.
4343
This fixed some small bugs, while improving code and providing a small
4444
speed gain. By `Joel Nothman`_.
45-
45+
4646
- Reduce memory usage and overhead when fitting and predicting with forests
4747
of randomized trees in parallel with ``n_jobs != 1`` by leveraging new
4848
threading backend of joblib 0.8 and releasing the GIL in the tree fitting
4949
Cython code. By `Olivier Grisel`_ and `Gilles Louppe`_.
5050

5151
- Speed improvement of the :mod:`sklearn.ensemble.gradient_boosting` module.
5252
By `Gilles Louppe`_ and `Peter Prettenhofer`_.
53-
53+
5454
- Various enhancements to the :mod:`sklearn.ensemble.gradient_boosting`
5555
module: a ``warm_start`` argument to fit additional trees,
5656
a ``max_leaf_nodes`` argument to fit GBM style trees,
5757
a ``monitor`` fit argument to inspect the estimator during training, and
5858
refactoring of the verbose code. By `Peter Prettenhofer`_.
59-
59+
6060
- Fixed bug in :class:`gradient_boosting.GradientBoostingRegressor` with
6161
``loss='huber'``: ``gamma`` might have not been initialized.
62-
62+
6363
- Fixed feature importances as computed with a forest of randomized trees
6464
when fit with ``sample_weight != None`` and/or with ``bootstrap=True``.
6565
By `Gilles Louppe`_.
@@ -197,6 +197,27 @@ API changes summary
197197
of alphas was not computed correctly and the scaling with normalize
198198
was wrong. By `Manoj Kumar`_.
199199

200+
- Fix wrong maximal number of features drawn (`max_features`) at each split
201+
for decision trees, random forests and gradient tree boosting.
202+
Previously, the count for the number of drawn features started only after
203+
one non constant features in the split. This bug fix will affect
204+
computational and generalization performance of those algorithms in the
205+
presence of constant features. To get back previous generalization
206+
performance, you should modify the value of `max_features`.
207+
By `Arnaud Joly`_.
208+
209+
- Fix wrong maximal number of features drawn (`max_features`) at each split
210+
for :class:`ensemble.ExtraTreesClassifier` and
211+
:class:`ensemble.ExtraTreesRegressor`. Previously, only non constant
212+
features in the split was counted as drawn. Now constant features are
213+
counted as drawn. Furthermore at least one feature must be non constant
214+
in order to make a valid split. This bug fix will affect
215+
computational and generalization performance of extra trees in the
216+
presence of constant features. To get back previous generalization
217+
performance, you should modify the value of `max_features`.
218+
By `Arnaud Joly`_.
219+
220+
200221
.. _changes_0_14:
201222

202223
0.14

0 commit comments

Comments
 (0)