@@ -1051,6 +1051,68 @@ multiplying the gradients (and the hessians) by the sample weights. Note that
10511051the binning stage (specifically the quantiles computation) does not take the
10521052weights into account.
10531053
1054+ .. _categorical_support_gbdt :
1055+ 
1056+ Categorical Features Support
1057+ ---------------------------- 
1058+ 
1059+ :class: `HistGradientBoostingClassifier ` and
1060+ :class: `HistGradientBoostingRegressor ` have native support for categorical
1061+ features: they can consider splits on non-ordered, categorical data.
1062+ 
1063+ For datasets with categorical features, using the native categorical support
1064+ is often better than relying on one-hot encoding
1065+ (:class: `~sklearn.preprocessing.OneHotEncoder `), because one-hot encoding
1066+ requires more tree depth to achieve equivalent splits. It is also usually
1067+ better to rely on the native categorical support rather than to treat
1068+ categorical features as continuous (ordinal), which happens for ordinal-encoded
1069+ categorical data, since categories are nominal quantities where order does not
1070+ matter.
1071+ 
1072+ To enable categorical support, a boolean mask can be passed to the
1073+ `categorical_features ` parameter, indicating which feature is categorical. In
1074+ the following, the first feature will be treated as categorical and the
1075+ second feature as numerical::
1076+ 
1077+   >>> gbdt = HistGradientBoostingClassifier(categorical_features=[True, False]) 
1078+ 
1079+ Equivalently, one can pass a list of integers indicating the indices of the
1080+ categorical features::
1081+ 
1082+   >>> gbdt = HistGradientBoostingClassifier(categorical_features=[0]) 
1083+ 
1084+ The cardinality of each categorical feature should be less than the `max_bins `
1085+ parameter, and each categorical feature is expected to be encoded in
1086+ `[0, max_bins - 1] `. To that end, it might be useful to pre-process the data
1087+ with an :class: `~sklearn.preprocessing.OrdinalEncoder ` as done in
1088+ :ref: `sphx_glr_auto_examples_ensemble_plot_gradient_boosting_categorical.py `.
1089+ 
1090+ If there are missing values during training, the missing values will be
1091+ treated as a proper category. If there are no missing values during training,
1092+ then at prediction time, missing values are mapped to the child node that has
1093+ the most samples (just like for continuous features). When predicting,
1094+ categories that were not seen during fit time will be treated as missing
1095+ values.
1096+ 
1097+ **Split finding with categorical features **: The canonical way of considering
1098+ categorical splits in a tree is to consider
1099+ all of the :math: `2 ^{K - 1 } - 1 ` partitions, where :math: `K` is the number of
1100+ categories. This can quickly become prohibitive when :math: `K` is large.
1101+ Fortunately, since gradient boosting trees are always regression trees (even
1102+ for classification problems), there exist a faster strategy that can yield
1103+ equivalent splits. First, the categories of a feature are sorted according to
1104+ the variance of the target, for each category `k `. Once the categories are
1105+ sorted, one can consider *continuous partitions *, i.e. treat the categories
1106+ as if they were ordered continuous values (see Fisher [Fisher1958 ]_ for a
1107+ formal proof). As a result, only :math: `K - 1 ` splits need to be considered
1108+ instead of :math: `2 ^{K - 1 } - 1 `. The initial sorting is a
1109+ :math: `\mathcal {O}(K \log (K))` operation, leading to a total complexity of
1110+ :math: `\mathcal {O}(K \log (K) + K)`, instead of :math: `\mathcal {O}(2 ^K)`.
1111+ 
1112+ .. topic :: Examples: 
1113+ 
1114+   * :ref: `sphx_glr_auto_examples_ensemble_plot_gradient_boosting_categorical.py `
1115+ 
10541116.. _monotonic_cst_gbdt :
10551117
10561118Monotonic Constraints
@@ -1092,6 +1154,10 @@ that the feature is supposed to have a positive / negative effect on the
10921154probability to belong to the positive class. Monotonic constraints are not
10931155supported for multiclass context.
10941156
1157+ .. note ::
1158+     Since categories are unordered quantities, it is not possible to enforce
1159+     monotonic constraints on categorical features.
1160+ 
10951161.. topic :: Examples: 
10961162
10971163  * :ref: `sphx_glr_auto_examples_ensemble_plot_monotonic_constraints.py `
@@ -1158,6 +1224,8 @@ Finally, many parts of the implementation of
11581224   .. [LightGBM ] Ke et. al. `"LightGBM: A Highly Efficient Gradient  
11591225     BoostingDecision Tree" <https://papers.nips.cc/paper/  
11601226     6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree> `_ 
1227+    .. [Fisher1958 ] Walter D. Fisher. `"On Grouping for Maximum Homogeneity"  
1228+      <http://www.csiss.org/SPACE/workshops/2004/SAC/files/fisher.pdf> `_ 
11611229
11621230 .. _voting_classifier :
11631231
0 commit comments