@@ -1855,25 +1855,53 @@ See concept :term:`sample property`.
18551855 See :ref: `group_cv `.
18561856
18571857 ``sample_weight ``
1858- A relative weight for each sample. Intuitively, if all weights are
1859- integers, a weighted model or score should be equivalent to that
1860- calculated when repeating the sample the number of times specified in
1861- the weight. Weights may be specified as floats, so that sample weights
1862- are usually equivalent up to a constant positive scaling factor.
1863-
1864- .. FIXME: Is this interpretation always the case in practice? We have no common tests.
1865-
1866- Some estimators, such as decision trees, support negative weights.
1867-
1868- .. FIXME: This feature or its absence may not be tested or documented in many estimators.
1869-
1870- This is not entirely the case where other parameters of the model
1871- consider the number of samples in a region, as with ``min_samples `` in
1872- :class: `cluster.DBSCAN `. In this case, a count of samples becomes
1873- to a sum of their weights.
1874-
1875- In classification, sample weights can also be specified as a function
1876- of class with the :term: `class_weight ` estimator :term: `parameter `.
1858+ A weight for each data point. Intuitively, if all weights are integers,
1859+ using them in an estimator or a :term: `scorer ` is like duplicating each
1860+ data point as many times as the weight value. Weights can also be
1861+ specified as floats, and can have the same effect as above, as many
1862+ estimators and scorers are scale invariant. For example, weights ``[1,
1863+ 2, 3] `` would be equivalent to weights ``[0.1, 0.2, 0.3] `` as they
1864+ differ by a constant factor of 10. Note however that several estimators
1865+ are not invariant to the scale of weights.
1866+
1867+ `sample_weight ` can be both an argument of the estimator's :term: `fit ` method
1868+ for model training or a parameter of a :term: `scorer ` for model
1869+ evaluation. These callables are said to *consume * the sample weights
1870+ while other components of scikit-learn can *route * the weights to the
1871+ underlying estimators or scorers (see
1872+ :ref: `glossary_metadata_routing `).
1873+
1874+ Weighting samples can be useful in several contexts. For instance, if
1875+ the training data is not uniformly sampled from the target population,
1876+ it can be corrected by weighting the training data points based on the
1877+ `inverse probability
1878+ <https://en.wikipedia.org/wiki/Inverse_probability_weighting> `_ of
1879+ their selection for training (e.g. inverse propensity weighting).
1880+
1881+ Some model hyper-parameters are expressed in terms of a discrete number
1882+ of data points in a region of the feature space. When fitting with
1883+ sample weights, a count of data points is often automatically converted
1884+ to a sum of their weights, but this is not always the case. Please
1885+ refer to the model docstring for details.
1886+
1887+ In classification, weights can also be specified for all samples
1888+ belonging to a given target class with the :term: `class_weight `
1889+ estimator :term: `parameter `. If both ``sample_weight `` and
1890+ ``class_weight `` are provided, the final weight assigned to a sample is
1891+ the product of the two.
1892+
1893+ At the time of writing (version 1.8), not all scikit-learn estimators
1894+ correctly implement the weight-repetition equivalence property. The
1895+ `#16298 meta issue
1896+ <https://github.com/scikit-learn/scikit-learn/issues/16298> `_ tracks
1897+ ongoing work to detect and fix remaining discrepancies.
1898+
1899+ Furthermore, some estimators have a stochastic fit method. For
1900+ instance, :class: `cluster.KMeans ` depends on a random initialization,
1901+ bagging models randomly resample from the training data, etc. In this
1902+ case, the sample weight-repetition equivalence property described above
1903+ does not hold exactly. However, it should hold at least in expectation
1904+ over the randomness of the fitting procedure.
18771905
18781906 ``X ``
18791907 Denotes data that is observed at training and prediction time, used as
0 commit comments