Skip to content

Commit 5fa2f9c

Browse files
committed
DOC: copyedit log loss, hint how multiclass generalizes binary
Changed notation to use y for true label, p for prediction, to be more consistent with the rest of the docs.
1 parent 2f07590 commit 5fa2f9c

File tree

2 files changed

+17
-11
lines changed

2 files changed

+17
-11
lines changed

doc/modules/model_evaluation.rst

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -681,29 +681,35 @@ as well as some variants of expectation-maximization,
681681
and can be used to evaluate the probability outputs (``predict_proba``)
682682
of a classifier, rather than its discrete predictions.
683683

684-
For binary classification with a true label :math:`y_t \in \{0,1\}`
685-
and a probability estimate :math:`y_p = P(y_t = 1)`,
684+
For binary classification with a true label :math:`y \in \{0,1\}`
685+
and a probability estimate :math:`p = \operatorname{Pr}(y = 1)`,
686686
the log loss per sample is the negative log-likelihood
687687
of the classifier given the true label:
688688

689689
.. math::
690690
691-
L_{\log}(y_t, y_p) = -\log P(y_t|y_p) = -(y_t \log y_p + (1 - y_t) \log (1 - y_p))
691+
L_{\log}(y, p) = -\log \operatorname{Pr}(y|p) = -(y \log p) + (1 - y) \log (1 - p))
692692
693693
This extends to the multiclass case as follows.
694694
Let the true labels for a set of samples
695-
be encoded as a 1-of-K binary indicator matrix :math:`T`,
696-
i.e. :math:`t_{i,k} = 1` if sample :math:`i` has label :math:`k`
695+
be encoded as a 1-of-K binary indicator matrix :math:`Y`,
696+
i.e. :math:`y_{i,k} = 1` if sample :math:`i` has label :math:`k`
697697
taken from a set of :math:`K` labels.
698-
Let :math:`Y` be a matrix of probability estimates,
699-
with :math:`y_{i,k} = P(t_{i,k} = 1)`.
700-
Then the total log loss of the whole set is
698+
Let :math:`P` be a matrix of probability estimates,
699+
with :math:`p_{i,k} = \operatorname{Pr}(t_{i,k} = 1)`.
700+
Then the log loss of the whole set is
701701

702702
.. math::
703703
704-
L_{\log}(T, Y) = -\log P(T|Y) = - \frac{1}{N} \sum_{i=0}^{N-1} \sum_{k=0}^{K-1} t_{i,k} \log y_{i,k}
704+
L_{\log}(Y, P) = -\log \operatorname{Pr}(Y|P) = - \frac{1}{N} \sum_{i=0}^{N-1} \sum_{k=0}^{K-1} y_{i,k} \log p_{i,k}
705705
706-
The function :func:`log_loss` computes either total or mean log loss
706+
To see how this generalizes the binary log loss given above,
707+
note that in the binary case,
708+
:math:`p_{i,0} = 1 - p_{i,1}` and :math:`y_{i,0} = 1 - y_{i,1}`,
709+
so expanding the inner sum over :math:`y_{i,k} \in \{0,1\}`
710+
gives the binary log loss.
711+
712+
The function :func:`log_loss` computes log loss
707713
given a list of ground-truth labels and a probability matrix,
708714
as returned by an estimator's ``predict_proba`` method.
709715

sklearn/metrics/metrics.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1072,7 +1072,7 @@ def log_loss(y_true, y_pred, eps=1e-15, normalize=True):
10721072
10731073
normalize : bool, optional (default=True)
10741074
If true, return the mean loss per sample.
1075-
Otherwise, return the total loss.
1075+
Otherwise, return the sum of the per-sample losses.
10761076
10771077
Returns
10781078
-------

0 commit comments

Comments
 (0)