@@ -681,29 +681,35 @@ as well as some variants of expectation-maximization,
681681and can be used to evaluate the probability outputs (``predict_proba ``)
682682of a classifier, rather than its discrete predictions.
683683
684- For binary classification with a true label :math: `y_t \in \{ 0 ,1 \}`
685- and a probability estimate :math: `y_p = P(y_t = 1 )`,
684+ For binary classification with a true label :math: `y \in \{ 0 ,1 \}`
685+ and a probability estimate :math: `p = \operatorname {Pr}(y = 1 )`,
686686the log loss per sample is the negative log-likelihood
687687of the classifier given the true label:
688688
689689.. math ::
690690
691- L_{\log }(y_t, y_p ) = -\log P(y_t|y_p ) = -(y_t \log y_p + (1 - y_t ) \log (1 - y_p ))
691+ L_{\log }(y, p ) = -\log \operatorname {Pr}(y|p ) = -(y \log p) + (1 - y ) \log (1 - p ))
692692
693693 This extends to the multiclass case as follows.
694694Let the true labels for a set of samples
695- be encoded as a 1-of-K binary indicator matrix :math: `T `,
696- i.e. :math: `t_ {i,k} = 1 ` if sample :math: `i` has label :math: `k`
695+ be encoded as a 1-of-K binary indicator matrix :math: `Y `,
696+ i.e. :math: `y_ {i,k} = 1 ` if sample :math: `i` has label :math: `k`
697697taken from a set of :math: `K` labels.
698- Let :math: `Y ` be a matrix of probability estimates,
699- with :math: `y_ {i,k} = P (t_{i,k} = 1 )`.
700- Then the total log loss of the whole set is
698+ Let :math: `P ` be a matrix of probability estimates,
699+ with :math: `p_ {i,k} = \operatorname {Pr} (t_{i,k} = 1 )`.
700+ Then the log loss of the whole set is
701701
702702.. math ::
703703
704- L_{\log }(T, Y ) = -\log P(T|Y ) = - \frac {1 }{N} \sum _{i=0 }^{N-1 } \sum _{k=0 }^{K-1 } t_ {i,k} \log y_ {i,k}
704+ L_{\log }(Y, P ) = -\log \operatorname {Pr}(Y|P ) = - \frac {1 }{N} \sum _{i=0 }^{N-1 } \sum _{k=0 }^{K-1 } y_ {i,k} \log p_ {i,k}
705705
706- The function :func: `log_loss ` computes either total or mean log loss
706+ To see how this generalizes the binary log loss given above,
707+ note that in the binary case,
708+ :math: `p_{i,0 } = 1 - p_{i,1 }` and :math: `y_{i,0 } = 1 - y_{i,1 }`,
709+ so expanding the inner sum over :math: `y_{i,k} \in \{ 0 ,1 \}`
710+ gives the binary log loss.
711+
712+ The function :func: `log_loss ` computes log loss
707713given a list of ground-truth labels and a probability matrix,
708714as returned by an estimator's ``predict_proba `` method.
709715
0 commit comments