dolfly
diff --git a/‎doc/modules/preprocessing.rst‎
Lines changed: 21 additions & 18 deletions b/‎doc/modules/preprocessing.rst‎
Lines changed: 21 additions & 18 deletions
diff --git a/‎examples/preprocessing/plot_target_encoder.py‎
Lines changed: 1 addition & 1 deletion b/‎examples/preprocessing/plot_target_encoder.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/preprocessing/plot_target_encoder_cross_val.py‎
Lines changed: 2 additions & 2 deletions b/‎examples/preprocessing/plot_target_encoder_cross_val.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎sklearn/preprocessing/_target_encoder.py‎
Lines changed: 18 additions & 3 deletions b/‎sklearn/preprocessing/_target_encoder.py‎
Lines changed: 18 additions & 3 deletions
@@ -936,34 +936,37 @@ cardinality categories are location based such as zip code or region.
   where :math:`L_i` is the set of observations with category :math:`i` and
   :math:`n_i` is the number of observations with category :math:`i`.
 
+.. note::
+  In :class:`TargetEncoder`, `fit(X, y).transform(X)` does not equal `fit_transform(X, y)`.
 
 :meth:`~TargetEncoder.fit_transform` internally relies on a :term:`cross fitting`
 scheme to prevent target information from leaking into the train-time
 representation, especially for non-informative high-cardinality categorical
-variables, and help prevent the downstream model from overfitting spurious
-correlations. Note that as a result, `fit(X, y).transform(X)` does not equal
-`fit_transform(X, y)`. In :meth:`~TargetEncoder.fit_transform`, the training
-data is split into *k* folds (determined by the `cv` parameter) and each fold is
-encoded using the encodings learnt using the other *k-1* folds. The following
-diagram shows the :term:`cross fitting` scheme in
+variables (features with many unique categories where each category appears
+only a few times), and help prevent the downstream model from overfitting spurious
+correlations. In :meth:`~TargetEncoder.fit_transform`, the training data is split into
+*k* folds (determined by the `cv` parameter) and each fold is encoded using the
+encodings learnt using the *other k-1* folds. For this reason, training data should
+always be trained and transformed with `fit_transform(X_train, y_train)`.
+
+This diagram shows the :term:`cross fitting` scheme in
 :meth:`~TargetEncoder.fit_transform` with the default `cv=5`:
 
 .. image:: ../images/target_encoder_cross_validation.svg
    :width: 600
    :align: center
 
-:meth:`~TargetEncoder.fit_transform` also learns a 'full data' encoding using
-the whole training set. This is never used in
-:meth:`~TargetEncoder.fit_transform` but is saved to the attribute `encodings_`,
-for use when :meth:`~TargetEncoder.transform` is called. Note that the encodings
-learned for each fold during the :term:`cross fitting` scheme are not saved to
-an attribute.
-
-The :meth:`~TargetEncoder.fit` method does **not** use any :term:`cross fitting`
-schemes and learns one encoding on the entire training set, which is used to
-encode categories in :meth:`~TargetEncoder.transform`.
-This encoding is the same as the 'full data'
-encoding learned in :meth:`~TargetEncoder.fit_transform`.
+The :meth:`~TargetEncoder.fit` method does **not** use any :term:`cross fitting` schemes
+and learns one encoding on the entire training set. It is discouraged to use this
+method because it can introduce data leakage as mentioned above. Use
+:meth:`~TargetEncoder.fit_transform` instead.
+
+During :meth:`~TargetEncoder.fit_transform`, the encoder learns category
+encodings from the full training data and stores them in the
+:attr:`~TargetEncoder.encodings_` attribute. The intermediate encodings learned
+for each fold during the :term:`cross fitting` process are temporary and not
+saved. The stored encodings can then be used to transform test data with
+`encoder.transform(X_test)`.
 
 .. note::
   :class:`TargetEncoder` considers missing values, such as `np.nan` or `None`,
 
@@ -13,7 +13,7 @@
 .. note::
     `fit(X, y).transform(X)` does not equal `fit_transform(X, y)` because a
     cross fitting scheme is used in `fit_transform` for encoding. See the
-    :ref:`User Guide <target_encoder>`. for details.
+    :ref:`User Guide <target_encoder>` for details.
 """
 
 # Authors: The scikit-learn developers
 
@@ -11,7 +11,7 @@
 and the target. To prevent overfitting, :meth:`TargetEncoder.fit_transform` uses
 an internal :term:`cross fitting` scheme to encode the training data to be used
 by a downstream model. This scheme involves splitting the data into *k* folds
-and encoding each fold using the encodings learnt using the other *k-1* folds.
+and encoding each fold using the encodings learnt using the *other k-1* folds.
 In this example, we demonstrate the importance of the cross
 fitting procedure to prevent overfitting.
 """
@@ -140,7 +140,7 @@
 # %%
 # While :meth:`TargetEncoder.fit_transform` uses an internal
 # :term:`cross fitting` scheme to learn encodings for the training set,
-# :meth:`TargetEncoder.transform` itself does not.
+# :meth:`TargetEncoder.fit` followed by :meth:`TargetEncoder.transform` does not.
 # It uses the complete training set to learn encodings and to transform the
 # categorical features. Thus, we can use :meth:`TargetEncoder.fit` followed by
 # :meth:`TargetEncoder.transform` to disable the :term:`cross fitting`. This
 
@@ -218,6 +218,14 @@ def __init__(
     def fit(self, X, y):
         """Fit the :class:`TargetEncoder` to X and y.
 
+        It is discouraged to use this method because it can introduce data leakage.
+        Use `fit_transform` on training data instead.
+
+        .. note::
+            `fit(X, y).transform(X)` does not equal `fit_transform(X, y)` because a
+            :term:`cross fitting` scheme is used in `fit_transform` for encoding.
+            See the :ref:`User Guide <target_encoder>` for details.
+
         Parameters
         ----------
         X : array-like of shape (n_samples, n_features)
@@ -236,12 +244,16 @@ def fit(self, X, y):
 
     @_fit_context(prefer_skip_nested_validation=True)
     def fit_transform(self, X, y):
-        """Fit :class:`TargetEncoder` and transform X with the target encoding.
+        """Fit :class:`TargetEncoder` and transform `X` with the target encoding.
+
+        This method uses a :term:`cross fitting` scheme to prevent target leakage
+        and overfitting in downstream predictors. It is the recommended method for
+        encoding training data.
 
         .. note::
             `fit(X, y).transform(X)` does not equal `fit_transform(X, y)` because a
             :term:`cross fitting` scheme is used in `fit_transform` for encoding.
-            See the :ref:`User Guide <target_encoder>`. for details.
+            See the :ref:`User Guide <target_encoder>` for details.
 
         Parameters
         ----------
@@ -314,10 +326,13 @@ def fit_transform(self, X, y):
     def transform(self, X):
         """Transform X with the target encoding.
 
+        This method internally uses the `encodings_` attribute learnt during
+        :meth:`TargetEncoder.fit_transform` to transform test data.
+
         .. note::
             `fit(X, y).transform(X)` does not equal `fit_transform(X, y)` because a
             :term:`cross fitting` scheme is used in `fit_transform` for encoding.
-            See the :ref:`User Guide <target_encoder>`. for details.
+            See the :ref:`User Guide <target_encoder>` for details.
 
         Parameters
         ----------