Skip to content

Commit b9f6ac4

Browse files
ndawearjoly
authored andcommitted
min_weight_fraction_leaf: narrative doc update
1 parent f62a6d1 commit b9f6ac4

File tree

1 file changed

+11
-7
lines changed

1 file changed

+11
-7
lines changed

doc/modules/tree.rst

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -320,15 +320,19 @@ Tips on practical use
320320
create arbitrary small leaves, though ``min_samples_split`` is more common
321321
in the literature.
322322

323-
* Balance your dataset before training to prevent the tree from creating a
324-
tree biased toward the classes that are dominant. Balance the dataset by
323+
* Balance your dataset before training to prevent the tree from being biased
324+
toward the classes that are dominant. Class balancing can be done by
325325
sampling an equal number of samples from each class, or preferably by
326326
normalizing the sum of the sample weights (``sample_weight``) for each
327-
class to the same value. Then use ``min_weight_fraction_leaf`` instead of
328-
``min_samples_leaf`` to control the leaf node sizes.
329-
``min_weight_fraction_leaf`` will ensure that leaf nodes contain at least
330-
some fraction of the overall sum of the sample weights and will not be
331-
biased toward the dominant classes like ``min_samples_leaf``.
327+
class to the same value. Also note that weight-based pre-pruning criteria,
328+
such as ``min_weight_fraction_leaf``, will then be less biased toward
329+
dominant classes than criteria that are not aware of the sample weights,
330+
like ``min_samples_leaf``.
331+
332+
* If the samples are weighted, it will be easier to optimize the tree
333+
structure using weight-based pre-pruning criterion such as
334+
``min_weight_fraction_leaf``, which ensure that leaf nodes contain at least
335+
a fraction of the overall sum of the sample weights.
332336

333337
* All decision trees use ``np.float32`` arrays internally.
334338
If training data is not in this format, a copy of the dataset will be made.

0 commit comments

Comments
 (0)