@@ -320,15 +320,19 @@ Tips on practical use
320320    create arbitrary small leaves, though ``min_samples_split `` is more common
321321    in the literature.
322322
323-   * Balance your dataset before training to prevent the tree from creating a 
324-     tree biased  toward the classes that are dominant. Balance the dataset  by
323+   * Balance your dataset before training to prevent the tree from being biased 
324+     toward the classes that are dominant. Class balancing can be done  by
325325    sampling an equal number of samples from each class, or preferably by
326326    normalizing the sum of the sample weights (``sample_weight ``) for each
327-     class to the same value. Then use ``min_weight_fraction_leaf `` instead of
328-     ``min_samples_leaf `` to control the leaf node sizes.
329-     ``min_weight_fraction_leaf `` will ensure that leaf nodes contain at least
330-     some fraction of the overall sum of the sample weights and will not be
331-     biased toward the dominant classes like ``min_samples_leaf ``.
327+     class to the same value. Also note that weight-based pre-pruning criteria,
328+     such as ``min_weight_fraction_leaf ``, will then be less biased toward
329+     dominant classes than criteria that are not aware of the sample weights,
330+     like ``min_samples_leaf ``.
331+ 
332+   * If the samples are weighted, it will be easier to optimize the tree
333+     structure using weight-based pre-pruning criterion such as
334+     ``min_weight_fraction_leaf ``, which ensure that leaf nodes contain at least
335+     a fraction of the overall sum of the sample weights.
332336
333337  * All decision trees use ``np.float32 `` arrays internally.
334338    If training data is not in this format, a copy of the dataset will be made.
0 commit comments