Skip to content

Commit f1c2470

Browse files
committed
Pushing the docs to dev/ for branch: master, commit af4247b152350b4fd0ac8bb9395833bd84e827d2
1 parent 97e17a5 commit f1c2470

File tree

1,104 files changed

+3281
-4327
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,104 files changed

+3281
-4327
lines changed
-18 Bytes
Binary file not shown.
-18 Bytes
Binary file not shown.

dev/_downloads/plot_classifier_chain_yeast.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"cell_type": "markdown",
1616
"metadata": {},
1717
"source": [
18-
"\n# Classifier Chain\n\nExample of using classifier chain on a multilabel dataset.\n\nFor this example we will use the `yeast\n<http://mldata.org/repository/data/viewslug/yeast>`_ dataset which contains\n2417 datapoints each with 103 features and 14 possible labels. Each\ndata point has at least one label. As a baseline we first train a logistic\nregression classifier for each of the 14 labels. To evaluate the performance of\nthese classifiers we predict on a held-out test set and calculate the\n`jaccard score <jaccard_score>` for each sample.\n\nNext we create 10 classifier chains. Each classifier chain contains a\nlogistic regression model for each of the 14 labels. The models in each\nchain are ordered randomly. In addition to the 103 features in the dataset,\neach model gets the predictions of the preceding models in the chain as\nfeatures (note that by default at training time each model gets the true\nlabels as features). These additional features allow each chain to exploit\ncorrelations among the classes. The Jaccard similarity score for each chain\ntends to be greater than that of the set independent logistic models.\n\nBecause the models in each chain are arranged randomly there is significant\nvariation in performance among the chains. Presumably there is an optimal\nordering of the classes in a chain that will yield the best performance.\nHowever we do not know that ordering a priori. Instead we can construct an\nvoting ensemble of classifier chains by averaging the binary predictions of\nthe chains and apply a threshold of 0.5. The Jaccard similarity score of the\nensemble is greater than that of the independent models and tends to exceed\nthe score of each chain in the ensemble (although this is not guaranteed\nwith randomly ordered chains).\n\n"
18+
"\n# Classifier Chain\n\nExample of using classifier chain on a multilabel dataset.\n\nFor this example we will use the `yeast\n<https://www.openml.org/d/40597>`_ dataset which contains\n2417 datapoints each with 103 features and 14 possible labels. Each\ndata point has at least one label. As a baseline we first train a logistic\nregression classifier for each of the 14 labels. To evaluate the performance of\nthese classifiers we predict on a held-out test set and calculate the\n`jaccard score <jaccard_score>` for each sample.\n\nNext we create 10 classifier chains. Each classifier chain contains a\nlogistic regression model for each of the 14 labels. The models in each\nchain are ordered randomly. In addition to the 103 features in the dataset,\neach model gets the predictions of the preceding models in the chain as\nfeatures (note that by default at training time each model gets the true\nlabels as features). These additional features allow each chain to exploit\ncorrelations among the classes. The Jaccard similarity score for each chain\ntends to be greater than that of the set independent logistic models.\n\nBecause the models in each chain are arranged randomly there is significant\nvariation in performance among the chains. Presumably there is an optimal\nordering of the classes in a chain that will yield the best performance.\nHowever we do not know that ordering a priori. Instead we can construct an\nvoting ensemble of classifier chains by averaging the binary predictions of\nthe chains and apply a threshold of 0.5. The Jaccard similarity score of the\nensemble is greater than that of the independent models and tends to exceed\nthe score of each chain in the ensemble (although this is not guaranteed\nwith randomly ordered chains).\n\n"
1919
]
2020
},
2121
{

dev/_downloads/plot_classifier_chain_yeast.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
Example of using classifier chain on a multilabel dataset.
66
77
For this example we will use the `yeast
8-
<http://mldata.org/repository/data/viewslug/yeast>`_ dataset which contains
8+
<https://www.openml.org/d/40597>`_ dataset which contains
99
2417 datapoints each with 103 features and 14 possible labels. Each
1010
data point has at least one label. As a baseline we first train a logistic
1111
regression classifier for each of the 14 labels. To evaluate the performance of

dev/_downloads/scikit-learn-docs.pdf

-1.8 KB
Binary file not shown.

dev/_images/iris.png

0 Bytes
-203 Bytes
-203 Bytes
-67 Bytes
-67 Bytes
-85 Bytes
-38 Bytes
-270 Bytes
-270 Bytes
11 Bytes
121 Bytes
-64 Bytes
26 Bytes
-275 Bytes
0 Bytes
0 Bytes
236 Bytes
236 Bytes
63 Bytes
63 Bytes
134 Bytes
134 Bytes
112 Bytes
112 Bytes
-76 Bytes
-76 Bytes
-133 Bytes
-133 Bytes
-32 Bytes
-32 Bytes
82 Bytes
-122 Bytes
-122 Bytes
38 Bytes
-299 Bytes
265 Bytes
-493 Bytes
-40 Bytes
-40 Bytes
-120 Bytes
-12 Bytes

dev/_sources/auto_examples/applications/plot_face_recognition.rst.txt

+19-19

dev/_sources/auto_examples/applications/plot_model_complexity_influence.rst.txt

+14-14

dev/_sources/auto_examples/applications/plot_out_of_core_classification.rst.txt

+30-30

dev/_sources/auto_examples/applications/plot_outlier_detection_housing.rst.txt

+1-1

0 commit comments

Comments
 (0)