File tree Expand file tree Collapse file tree 1 file changed +9
-2
lines changed Expand file tree Collapse file tree 1 file changed +9
-2
lines changed Original file line number Diff line number Diff line change @@ -320,8 +320,15 @@ Tips on practical use
320320 create arbitrary small leaves, though ``min_samples_split `` is more common
321321 in the literature.
322322
323- * Balance your dataset before training to prevent the tree from creating
324- a tree biased toward the classes that are dominant.
323+ * Balance your dataset before training to prevent the tree from creating a
324+ tree biased toward the classes that are dominant. Balance the dataset by
325+ sampling an equal number of samples from each class, or preferably by
326+ normalizing the sum of the sample weights (``sample_weight ``) for each
327+ class to the same value. Then use ``min_weight_fraction_leaf `` instead of
328+ ``min_samples_leaf `` to control the leaf node sizes.
329+ ``min_weight_fraction_leaf `` will ensure that leaf nodes contain at least
330+ some fraction of the overall sum of the sample weights and will not be
331+ biased toward the dominant classes like ``min_samples_leaf ``.
325332
326333 * All decision trees use ``np.float32 `` arrays internally.
327334 If training data is not in this format, a copy of the dataset will be made.
You can’t perform that action at this time.
0 commit comments