@@ -936,34 +936,37 @@ cardinality categories are location based such as zip code or region.
936936 where :math: `L_i` is the set of observations with category :math: `i` and
937937 :math: `n_i` is the number of observations with category :math: `i`.
938938
939+ .. note ::
940+ In :class: `TargetEncoder `, `fit(X, y).transform(X) ` does not equal `fit_transform(X, y) `.
939941
940942:meth: `~TargetEncoder.fit_transform ` internally relies on a :term: `cross fitting `
941943scheme to prevent target information from leaking into the train-time
942944representation, especially for non-informative high-cardinality categorical
943- variables, and help prevent the downstream model from overfitting spurious
944- correlations. Note that as a result, `fit(X, y).transform(X) ` does not equal
945- `fit_transform(X, y) `. In :meth: `~TargetEncoder.fit_transform `, the training
946- data is split into *k * folds (determined by the `cv ` parameter) and each fold is
947- encoded using the encodings learnt using the other *k-1 * folds. The following
948- diagram shows the :term: `cross fitting ` scheme in
945+ variables (features with many unique categories where each category appears
946+ only a few times), and help prevent the downstream model from overfitting spurious
947+ correlations. In :meth: `~TargetEncoder.fit_transform `, the training data is split into
948+ *k * folds (determined by the `cv ` parameter) and each fold is encoded using the
949+ encodings learnt using the *other k-1 * folds. For this reason, training data should
950+ always be trained and transformed with `fit_transform(X_train, y_train) `.
951+
952+ This diagram shows the :term: `cross fitting ` scheme in
949953:meth: `~TargetEncoder.fit_transform ` with the default `cv=5 `:
950954
951955.. image :: ../images/target_encoder_cross_validation.svg
952956 :width: 600
953957 :align: center
954958
955- :meth: `~TargetEncoder.fit_transform ` also learns a 'full data' encoding using
956- the whole training set. This is never used in
957- :meth: `~TargetEncoder.fit_transform ` but is saved to the attribute `encodings_ `,
958- for use when :meth: `~TargetEncoder.transform ` is called. Note that the encodings
959- learned for each fold during the :term: `cross fitting ` scheme are not saved to
960- an attribute.
961-
962- The :meth: `~TargetEncoder.fit ` method does **not ** use any :term: `cross fitting `
963- schemes and learns one encoding on the entire training set, which is used to
964- encode categories in :meth: `~TargetEncoder.transform `.
965- This encoding is the same as the 'full data'
966- encoding learned in :meth: `~TargetEncoder.fit_transform `.
959+ The :meth: `~TargetEncoder.fit ` method does **not ** use any :term: `cross fitting ` schemes
960+ and learns one encoding on the entire training set. It is discouraged to use this
961+ method because it can introduce data leakage as mentioned above. Use
962+ :meth: `~TargetEncoder.fit_transform ` instead.
963+
964+ During :meth: `~TargetEncoder.fit_transform `, the encoder learns category
965+ encodings from the full training data and stores them in the
966+ :attr: `~TargetEncoder.encodings_ ` attribute. The intermediate encodings learned
967+ for each fold during the :term: `cross fitting ` process are temporary and not
968+ saved. The stored encodings can then be used to transform test data with
969+ `encoder.transform(X_test) `.
967970
968971.. note ::
969972 :class: `TargetEncoder ` considers missing values, such as `np.nan ` or `None `,
0 commit comments