Skip to content

Commit 80c0a29

Browse files
DOC improve the ROC-AUC docstring (scikit-learn#18110)
Co-authored-by: Thomas J. Fan <[email protected]>
1 parent 1df5d94 commit 80c0a29

File tree

2 files changed

+128
-27
lines changed

2 files changed

+128
-27
lines changed

doc/modules/model_evaluation.rst

Lines changed: 66 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1326,21 +1326,48 @@ area under the roc curve, the curve information is summarized in one number.
13261326
For more information see the `Wikipedia article on AUC
13271327
<https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve>`_.
13281328

1329-
>>> import numpy as np
1329+
Compared to metrics such as the subset accuracy, the Hamming loss, or the
1330+
F1 score, ROC doesn't require optimizing a threshold for each label.
1331+
1332+
.. _roc_auc_binary:
1333+
1334+
Binary case
1335+
^^^^^^^^^^^
1336+
1337+
In the **binary case**, you can either provide the probability estimates, using
1338+
the `classifier.predict_proba()` method, or the non-thresholded decision values
1339+
given by the `classifier.decision_function()` method. In the case of providing
1340+
the probability estimates, the probability of the class with the
1341+
"greater label" should be provided. The "greater label" corresponds to
1342+
`classifier.classes_[1]` and thus `classifier.predict_proba(X)[:, 1]`.
1343+
Therefore, the `y_score` parameter is of size (n_samples,).
1344+
1345+
>>> from sklearn.datasets import load_breast_cancer
1346+
>>> from sklearn.linear_model import LogisticRegression
13301347
>>> from sklearn.metrics import roc_auc_score
1331-
>>> y_true = np.array([0, 0, 1, 1])
1332-
>>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
1333-
>>> roc_auc_score(y_true, y_scores)
1334-
0.75
1348+
>>> X, y = load_breast_cancer(return_X_y=True)
1349+
>>> clf = LogisticRegression(solver="liblinear").fit(X, y)
1350+
>>> clf.classes_
1351+
array([0, 1])
13351352

1336-
In multi-label classification, the :func:`roc_auc_score` function is
1337-
extended by averaging over the labels as :ref:`above <average>`.
1353+
We can use the probability estimates corresponding to `clf.classes_[1]`.
13381354

1339-
Compared to metrics such as the subset accuracy, the Hamming loss, or the
1340-
F1 score, ROC doesn't require optimizing a threshold for each label.
1355+
>>> y_score = clf.predict_proba(X)[:, 1]
1356+
>>> roc_auc_score(y, y_score)
1357+
0.99...
1358+
1359+
Otherwise, we can use the non-thresholded decision values
13411360

1342-
The :func:`roc_auc_score` function can also be used in multi-class
1343-
classification. Two averaging strategies are currently supported: the
1361+
>>> roc_auc_score(y, clf.decision_function(X))
1362+
0.99...
1363+
1364+
.. _roc_auc_multiclass:
1365+
1366+
Multi-class case
1367+
^^^^^^^^^^^^^^^^
1368+
1369+
The :func:`roc_auc_score` function can also be used in **multi-class
1370+
classification**. Two averaging strategies are currently supported: the
13441371
one-vs-one algorithm computes the average of the pairwise ROC AUC scores, and
13451372
the one-vs-rest algorithm computes the average of the ROC AUC scores for each
13461373
class against all other classes. In both cases, the predicted labels are
@@ -1394,6 +1421,34 @@ to the given limit.
13941421
:scale: 75
13951422
:align: center
13961423

1424+
.. _roc_auc_multilabel:
1425+
1426+
Multi-label case
1427+
^^^^^^^^^^^^^^^^
1428+
1429+
In **multi-label classification**, the :func:`roc_auc_score` function is
1430+
extended by averaging over the labels as :ref:`above <average>`. In this case,
1431+
you should provide a `y_score` of shape `(n_samples, n_classes)`. Thus, when
1432+
using the probability estimates, one needs to select the probability of the
1433+
class with the greater label for each output.
1434+
1435+
>>> from sklearn.datasets import make_multilabel_classification
1436+
>>> from sklearn.multioutput import MultiOutputClassifier
1437+
>>> X, y = make_multilabel_classification(random_state=0)
1438+
>>> inner_clf = LogisticRegression(solver="liblinear", random_state=0)
1439+
>>> clf = MultiOutputClassifier(inner_clf).fit(X, y)
1440+
>>> y_score = np.transpose([y_pred[:, 1] for y_pred in clf.predict_proba(X)])
1441+
>>> roc_auc_score(y, y_score, average=None)
1442+
array([0.82..., 0.86..., 0.94..., 0.85... , 0.94...])
1443+
1444+
And the decision values do not require such processing.
1445+
1446+
>>> from sklearn.linear_model import RidgeClassifierCV
1447+
>>> clf = RidgeClassifierCV().fit(X, y)
1448+
>>> y_score = clf.decision_function(X)
1449+
>>> roc_auc_score(y, y_score, average=None)
1450+
array([0.81..., 0.84... , 0.93..., 0.87..., 0.94...])
1451+
13971452
.. topic:: Examples:
13981453

13991454
* See :ref:`sphx_glr_auto_examples_model_selection_plot_roc.py`

sklearn/metrics/_ranking.py

Lines changed: 62 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -360,16 +360,31 @@ def roc_auc_score(y_true, y_score, *, average="macro", sample_weight=None,
360360
binary label indicators with shape (n_samples, n_classes).
361361
362362
y_score : array-like of shape (n_samples,) or (n_samples, n_classes)
363-
Target scores. In the binary and multilabel cases, these can be either
364-
probability estimates or non-thresholded decision values (as returned
365-
by `decision_function` on some classifiers). In the multiclass case,
366-
these must be probability estimates which sum to 1. The binary
367-
case expects a shape (n_samples,), and the scores must be the scores of
368-
the class with the greater label. The multiclass and multilabel
369-
cases expect a shape (n_samples, n_classes). In the multiclass case,
370-
the order of the class scores must correspond to the order of
371-
``labels``, if provided, or else to the numerical or lexicographical
372-
order of the labels in ``y_true``.
363+
Target scores.
364+
365+
* In the binary case, it corresponds to an array of shape
366+
`(n_samples,)`. Both probability estimates and non-thresholded
367+
decision values can be provided. The probability estimates correspond
368+
to the **probability of the class with the greater label**,
369+
i.e. `estimator.classes_[1]` and thus
370+
`estimator.predict_proba(X, y)[:, 1]`. The decision values
371+
corresponds to the output of `estimator.decision_function(X, y)`.
372+
See more information in the :ref:`User guide <roc_auc_binary>`;
373+
* In the multiclass case, it corresponds to an array of shape
374+
`(n_samples, n_classes)` of probability estimates provided by the
375+
`predict_proba` method. The probability estimates **must**
376+
sum to 1 across the possible classes. In addition, the order of the
377+
class scores must correspond to the order of ``labels``,
378+
if provided, or else to the numerical or lexicographical order of
379+
the labels in ``y_true``. See more information in the
380+
:ref:`User guide <roc_auc_multiclass>`;
381+
* In the multilabel case, it corresponds to an array of shape
382+
`(n_samples, n_classes)`. Probability estimates are provided by the
383+
`predict_proba` method and the non-thresholded decision values by
384+
the `decision_function` method. The probability estimates correspond
385+
to the **probability of the class with the greater label for each
386+
output** of the classifier. See more information in the
387+
:ref:`User guide <roc_auc_multilabel>`.
373388
374389
average : {'micro', 'macro', 'samples', 'weighted'} or None, \
375390
default='macro'
@@ -447,7 +462,7 @@ def roc_auc_score(y_true, y_score, *, average="macro", sample_weight=None,
447462
Machine Learning, 45(2), 171-186.
448463
<http://link.springer.com/article/10.1023/A:1010920819831>`_
449464
450-
See also
465+
See Also
451466
--------
452467
average_precision_score : Area under the precision-recall curve
453468
@@ -457,12 +472,43 @@ def roc_auc_score(y_true, y_score, *, average="macro", sample_weight=None,
457472
458473
Examples
459474
--------
460-
>>> import numpy as np
475+
Binary case:
476+
477+
>>> from sklearn.datasets import load_breast_cancer
478+
>>> from sklearn.linear_model import LogisticRegression
461479
>>> from sklearn.metrics import roc_auc_score
462-
>>> y_true = np.array([0, 0, 1, 1])
463-
>>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
464-
>>> roc_auc_score(y_true, y_scores)
465-
0.75
480+
>>> X, y = load_breast_cancer(return_X_y=True)
481+
>>> clf = LogisticRegression(solver="liblinear", random_state=0).fit(X, y)
482+
>>> roc_auc_score(y, clf.predict_proba(X)[:, 1])
483+
0.99...
484+
>>> roc_auc_score(y, clf.decision_function(X))
485+
0.99...
486+
487+
Multiclass case:
488+
489+
>>> from sklearn.datasets import load_iris
490+
>>> X, y = load_iris(return_X_y=True)
491+
>>> clf = LogisticRegression(solver="liblinear").fit(X, y)
492+
>>> roc_auc_score(y, clf.predict_proba(X), multi_class='ovr')
493+
0.99...
494+
495+
Multilabel case:
496+
497+
>>> from sklearn.datasets import make_multilabel_classification
498+
>>> from sklearn.multioutput import MultiOutputClassifier
499+
>>> X, y = make_multilabel_classification(random_state=0)
500+
>>> clf = MultiOutputClassifier(clf).fit(X, y)
501+
>>> # get a list of n_output containing probability arrays of shape
502+
>>> # (n_samples, n_classes)
503+
>>> y_pred = clf.predict_proba(X)
504+
>>> # extract the positive columns for each output
505+
>>> y_pred = np.transpose([pred[:, 1] for pred in y_pred])
506+
>>> roc_auc_score(y, y_pred, average=None)
507+
array([0.82..., 0.86..., 0.94..., 0.85... , 0.94...])
508+
>>> from sklearn.linear_model import RidgeClassifierCV
509+
>>> clf = RidgeClassifierCV().fit(X, y)
510+
>>> roc_auc_score(y, clf.decision_function(X), average=None)
511+
array([0.81..., 0.84... , 0.93..., 0.87..., 0.94...])
466512
"""
467513

468514
y_type = type_of_target(y_true)

0 commit comments

Comments
 (0)