TanShiKai
diff --git a/‎dev/_downloads/00701bf1048deb8daeb5ad086596d260/plot_lasso_lars.ipynb
+1-1 b/‎dev/_downloads/00701bf1048deb8daeb5ad086596d260/plot_lasso_lars.ipynb
+1-1
diff --git a/‎dev/_downloads/00727cbc15047062964b3f55fc4571b7/plot_label_propagation_digits_active_learning.ipynb
+1-1 b/‎dev/_downloads/00727cbc15047062964b3f55fc4571b7/plot_label_propagation_digits_active_learning.ipynb
+1-1
diff --git a/‎dev/_downloads/021e92302b91d52cc0533a6d8c7016e7/plot_tree_regression.ipynb
+1-1 b/‎dev/_downloads/021e92302b91d52cc0533a6d8c7016e7/plot_tree_regression.ipynb
+1-1
diff --git a/‎dev/_downloads/0285d04f43da1e9f71cd93c44a262e4a/plot_sparse_coding.ipynb
+1-1 b/‎dev/_downloads/0285d04f43da1e9f71cd93c44a262e4a/plot_sparse_coding.ipynb
+1-1
diff --git a/‎dev/_downloads/02c23dfdb2c05d59dbdb168b1fdcb3d8/plot_feature_agglomeration_vs_univariate_selection.ipynb
+1-1 b/‎dev/_downloads/02c23dfdb2c05d59dbdb168b1fdcb3d8/plot_feature_agglomeration_vs_univariate_selection.ipynb
+1-1
diff --git a/‎dev/_downloads/0314576f362d54c36e37904494f4c7d9/plot_gpc_iris.ipynb
+1-1 b/‎dev/_downloads/0314576f362d54c36e37904494f4c7d9/plot_gpc_iris.ipynb
+1-1
diff --git a/‎dev/_downloads/039de292122a5274c38d249b744c9b55/plot_pca_vs_lda.ipynb
+1-1 b/‎dev/_downloads/039de292122a5274c38d249b744c9b55/plot_pca_vs_lda.ipynb
+1-1
diff --git a/‎dev/_downloads/053f85f102c07abae7dc20a87b112911/plot_gpr_noisy_targets.ipynb
+1-1 b/‎dev/_downloads/053f85f102c07abae7dc20a87b112911/plot_gpr_noisy_targets.ipynb
+1-1
diff --git a/‎dev/_downloads/05af92ebbf361ad6b838e9d4e34901da/plot_sgd_penalties.ipynb
+1-1 b/‎dev/_downloads/05af92ebbf361ad6b838e9d4e34901da/plot_sgd_penalties.ipynb
+1-1
diff --git a/‎dev/_downloads/06d1bf4510bd82b665c44c2bd2c364ae/plot_feature_selection_pipeline.ipynb
+1-1 b/‎dev/_downloads/06d1bf4510bd82b665c44c2bd2c364ae/plot_feature_selection_pipeline.ipynb
+1-1
diff --git a/‎dev/_downloads/0855ac0efd714397341de370a68cf6f3/plot_mean_shift.ipynb
+1-1 b/‎dev/_downloads/0855ac0efd714397341de370a68cf6f3/plot_mean_shift.ipynb
+1-1
diff --git a/‎dev/_downloads/0d59ba71a84b25ededa8e1298aed7cf2/plot_transformed_target.ipynb
+1-1 b/‎dev/_downloads/0d59ba71a84b25ededa8e1298aed7cf2/plot_transformed_target.ipynb
+1-1
diff --git a/‎dev/_downloads/0e54710c34c6326f0886857069d1cc1f/plot_permutation_importance_multicollinear.ipynb
+1-1 b/‎dev/_downloads/0e54710c34c6326f0886857069d1cc1f/plot_permutation_importance_multicollinear.ipynb
+1-1
diff --git a/‎dev/_downloads/0f2070eb0ba0c1cd77d1ae6069402bea/plot_random_multilabel_dataset.ipynb
+1-1 b/‎dev/_downloads/0f2070eb0ba0c1cd77d1ae6069402bea/plot_random_multilabel_dataset.ipynb
+1-1
diff --git a/‎dev/_downloads/0fd95d70777d02a9f972c3bed6186ecb/plot_svm_anova.ipynb
+1-1 b/‎dev/_downloads/0fd95d70777d02a9f972c3bed6186ecb/plot_svm_anova.ipynb
+1-1
diff --git a/‎dev/_downloads/14993ca765eb710eea95ea55782ffaf7/plot_digits_linkage.ipynb
+1-1 b/‎dev/_downloads/14993ca765eb710eea95ea55782ffaf7/plot_digits_linkage.ipynb
+1-1
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Lasso path using LARS\n\n\nComputes Lasso Path along the regularization parameter using the LARS\nalgorithm on the diabetes dataset. Each color represents a different\nfeature of the coefficient vector, and this is displayed as a function\nof the regularization parameter.\n\n\n"
+        "\n# Lasso path using LARS\n\n\nComputes Lasso Path along the regularization parameter using the LARS\nalgorithm on the diabetes dataset. Each color represents a different\nfeature of the coefficient vector, and this is displayed as a function\nof the regularization parameter.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Label Propagation digits active learning\n\n\nDemonstrates an active learning technique to learn handwritten digits\nusing label propagation.\n\nWe start by training a label propagation model with only 10 labeled points,\nthen we select the top five most uncertain points to label. Next, we train\nwith 15 labeled points (original 10 + 5 new ones). We repeat this process\nfour times to have a model trained with 30 labeled examples. Note you can\nincrease this to label more than 30 by changing `max_iterations`. Labeling\nmore than 30 can be useful to get a sense for the speed of convergence of\nthis active learning technique.\n\nA plot will appear showing the top 5 most uncertain digits for each iteration\nof training. These may or may not contain mistakes, but we will train the next\nmodel with their true labels.\n\n"
+        "\n# Label Propagation digits active learning\n\n\nDemonstrates an active learning technique to learn handwritten digits\nusing label propagation.\n\nWe start by training a label propagation model with only 10 labeled points,\nthen we select the top five most uncertain points to label. Next, we train\nwith 15 labeled points (original 10 + 5 new ones). We repeat this process\nfour times to have a model trained with 30 labeled examples. Note you can\nincrease this to label more than 30 by changing `max_iterations`. Labeling\nmore than 30 can be useful to get a sense for the speed of convergence of\nthis active learning technique.\n\nA plot will appear showing the top 5 most uncertain digits for each iteration\nof training. These may or may not contain mistakes, but we will train the next\nmodel with their true labels.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Decision Tree Regression\n\n\nA 1D regression with decision tree.\n\nThe `decision trees <tree>` is\nused to fit a sine curve with addition noisy observation. As a result, it\nlearns local linear regressions approximating the sine curve.\n\nWe can see that if the maximum depth of the tree (controlled by the\n`max_depth` parameter) is set too high, the decision trees learn too fine\ndetails of the training data and learn from the noise, i.e. they overfit.\n\n"
+        "\n# Decision Tree Regression\n\n\nA 1D regression with decision tree.\n\nThe `decision trees <tree>` is\nused to fit a sine curve with addition noisy observation. As a result, it\nlearns local linear regressions approximating the sine curve.\n\nWe can see that if the maximum depth of the tree (controlled by the\n`max_depth` parameter) is set too high, the decision trees learn too fine\ndetails of the training data and learn from the noise, i.e. they overfit.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Sparse coding with a precomputed dictionary\n\n\nTransform a signal as a sparse combination of Ricker wavelets. This example\nvisually compares different sparse coding methods using the\n:class:`sklearn.decomposition.SparseCoder` estimator. The Ricker (also known\nas Mexican hat or the second derivative of a Gaussian) is not a particularly\ngood kernel to represent piecewise constant signals like this one. It can\ntherefore be seen how much adding different widths of atoms matters and it\ntherefore motivates learning the dictionary to best fit your type of signals.\n\nThe richer dictionary on the right is not larger in size, heavier subsampling\nis performed in order to stay on the same order of magnitude.\n\n"
+        "\n# Sparse coding with a precomputed dictionary\n\n\nTransform a signal as a sparse combination of Ricker wavelets. This example\nvisually compares different sparse coding methods using the\n:class:`sklearn.decomposition.SparseCoder` estimator. The Ricker (also known\nas Mexican hat or the second derivative of a Gaussian) is not a particularly\ngood kernel to represent piecewise constant signals like this one. It can\ntherefore be seen how much adding different widths of atoms matters and it\ntherefore motivates learning the dictionary to best fit your type of signals.\n\nThe richer dictionary on the right is not larger in size, heavier subsampling\nis performed in order to stay on the same order of magnitude.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n==============================================\nFeature agglomeration vs. univariate selection\n==============================================\n\nThis example compares 2 dimensionality reduction strategies:\n\n- univariate feature selection with Anova\n\n- feature agglomeration with Ward hierarchical clustering\n\nBoth methods are compared in a regression problem using\na BayesianRidge as supervised estimator.\n\n"
+        "\n==============================================\nFeature agglomeration vs. univariate selection\n==============================================\n\nThis example compares 2 dimensionality reduction strategies:\n\n- univariate feature selection with Anova\n\n- feature agglomeration with Ward hierarchical clustering\n\nBoth methods are compared in a regression problem using\na BayesianRidge as supervised estimator.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n=====================================================\nGaussian process classification (GPC) on iris dataset\n=====================================================\n\nThis example illustrates the predicted probability of GPC for an isotropic\nand anisotropic RBF kernel on a two-dimensional version for the iris-dataset.\nThe anisotropic RBF kernel obtains slightly higher log-marginal-likelihood by\nassigning different length-scales to the two feature dimensions.\n\n"
+        "\n=====================================================\nGaussian process classification (GPC) on iris dataset\n=====================================================\n\nThis example illustrates the predicted probability of GPC for an isotropic\nand anisotropic RBF kernel on a two-dimensional version for the iris-dataset.\nThe anisotropic RBF kernel obtains slightly higher log-marginal-likelihood by\nassigning different length-scales to the two feature dimensions.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Comparison of LDA and PCA 2D projection of Iris dataset\n\n\nThe Iris dataset represents 3 kind of Iris flowers (Setosa, Versicolour\nand Virginica) with 4 attributes: sepal length, sepal width, petal length\nand petal width.\n\nPrincipal Component Analysis (PCA) applied to this data identifies the\ncombination of attributes (principal components, or directions in the\nfeature space) that account for the most variance in the data. Here we\nplot the different samples on the 2 first principal components.\n\nLinear Discriminant Analysis (LDA) tries to identify attributes that\naccount for the most variance *between classes*. In particular,\nLDA, in contrast to PCA, is a supervised method, using known class labels.\n\n"
+        "\n# Comparison of LDA and PCA 2D projection of Iris dataset\n\n\nThe Iris dataset represents 3 kind of Iris flowers (Setosa, Versicolour\nand Virginica) with 4 attributes: sepal length, sepal width, petal length\nand petal width.\n\nPrincipal Component Analysis (PCA) applied to this data identifies the\ncombination of attributes (principal components, or directions in the\nfeature space) that account for the most variance in the data. Here we\nplot the different samples on the 2 first principal components.\n\nLinear Discriminant Analysis (LDA) tries to identify attributes that\naccount for the most variance *between classes*. In particular,\nLDA, in contrast to PCA, is a supervised method, using known class labels.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n=========================================================\nGaussian Processes regression: basic introductory example\n=========================================================\n\nA simple one-dimensional regression example computed in two different ways:\n\n1. A noise-free case\n2. A noisy case with known noise-level per datapoint\n\nIn both cases, the kernel's parameters are estimated using the maximum\nlikelihood principle.\n\nThe figures illustrate the interpolating property of the Gaussian Process\nmodel as well as its probabilistic nature in the form of a pointwise 95%\nconfidence interval.\n\nNote that the parameter ``alpha`` is applied as a Tikhonov\nregularization of the assumed covariance between the training points.\n\n"
+        "\n=========================================================\nGaussian Processes regression: basic introductory example\n=========================================================\n\nA simple one-dimensional regression example computed in two different ways:\n\n1. A noise-free case\n2. A noisy case with known noise-level per datapoint\n\nIn both cases, the kernel's parameters are estimated using the maximum\nlikelihood principle.\n\nThe figures illustrate the interpolating property of the Gaussian Process\nmodel as well as its probabilistic nature in the form of a pointwise 95%\nconfidence interval.\n\nNote that the parameter ``alpha`` is applied as a Tikhonov\nregularization of the assumed covariance between the training points.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n==============\nSGD: Penalties\n==============\n\nContours of where the penalty is equal to 1\nfor the three penalties L1, L2 and elastic-net.\n\nAll of the above are supported by\n:class:`sklearn.linear_model.stochastic_gradient`.\n\n\n"
+        "\n==============\nSGD: Penalties\n==============\n\nContours of where the penalty is equal to 1\nfor the three penalties L1, L2 and elastic-net.\n\nAll of the above are supported by\n:class:`sklearn.linear_model.stochastic_gradient`.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Pipeline Anova SVM\n\n\nSimple usage of Pipeline that runs successively a univariate\nfeature selection with anova and then a SVM of the selected features.\n\nUsing a sub-pipeline, the fitted coefficients can be mapped back into\nthe original feature space.\n\n"
+        "\n# Pipeline Anova SVM\n\n\nSimple usage of Pipeline that runs successively a univariate\nfeature selection with anova and then a SVM of the selected features.\n\nUsing a sub-pipeline, the fitted coefficients can be mapped back into\nthe original feature space.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# A demo of the mean-shift clustering algorithm\n\n\nReference:\n\nDorin Comaniciu and Peter Meer, \"Mean Shift: A robust approach toward\nfeature space analysis\". IEEE Transactions on Pattern Analysis and\nMachine Intelligence. 2002. pp. 603-619.\n\n\n"
+        "\n# A demo of the mean-shift clustering algorithm\n\n\nReference:\n\nDorin Comaniciu and Peter Meer, \"Mean Shift: A robust approach toward\nfeature space analysis\". IEEE Transactions on Pattern Analysis and\nMachine Intelligence. 2002. pp. 603-619.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Effect of transforming the targets in regression model\n\n\nIn this example, we give an overview of the\n:class:`sklearn.compose.TransformedTargetRegressor`. Two examples\nillustrate the benefit of transforming the targets before learning a linear\nregression model. The first example uses synthetic data while the second\nexample is based on the Boston housing data set.\n\n\n"
+        "\n# Effect of transforming the targets in regression model\n\n\nIn this example, we give an overview of the\n:class:`sklearn.compose.TransformedTargetRegressor`. Two examples\nillustrate the benefit of transforming the targets before learning a linear\nregression model. The first example uses synthetic data while the second\nexample is based on the Boston housing data set.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Permutation Importance with Multicollinear or Correlated Features\n\n\nIn this example, we compute the permutation importance on the Wisconsin\nbreast cancer dataset using :func:`~sklearn.inspection.permutation_importance`.\nThe :class:`~sklearn.ensemble.RandomForestClassifier` can easily get about 97%\naccuracy on a test dataset. Because this dataset contains multicollinear\nfeatures, the permutation importance will show that none of the features are\nimportant. One approach to handling multicollinearity is by performing\nhierarchical clustering on the features' Spearman rank-order correlations,\npicking a threshold, and keeping a single feature from each cluster.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>See also\n    `sphx_glr_auto_examples_inspection_plot_permutation_importance.py`</p></div>\n\n"
+        "\n# Permutation Importance with Multicollinear or Correlated Features\n\n\nIn this example, we compute the permutation importance on the Wisconsin\nbreast cancer dataset using :func:`~sklearn.inspection.permutation_importance`.\nThe :class:`~sklearn.ensemble.RandomForestClassifier` can easily get about 97%\naccuracy on a test dataset. Because this dataset contains multicollinear\nfeatures, the permutation importance will show that none of the features are\nimportant. One approach to handling multicollinearity is by performing\nhierarchical clustering on the features' Spearman rank-order correlations,\npicking a threshold, and keeping a single feature from each cluster.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>See also\n    `sphx_glr_auto_examples_inspection_plot_permutation_importance.py`</p></div>\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Plot randomly generated multilabel dataset\n\n\nThis illustrates the `datasets.make_multilabel_classification` dataset\ngenerator. Each sample consists of counts of two features (up to 50 in\ntotal), which are differently distributed in each of two classes.\n\nPoints are labeled as follows, where Y means the class is present:\n\n    =====  =====  =====  ======\n      1      2      3    Color\n    =====  =====  =====  ======\n      Y      N      N    Red\n      N      Y      N    Blue\n      N      N      Y    Yellow\n      Y      Y      N    Purple\n      Y      N      Y    Orange\n      Y      Y      N    Green\n      Y      Y      Y    Brown\n    =====  =====  =====  ======\n\nA star marks the expected sample for each class; its size reflects the\nprobability of selecting that class label.\n\nThe left and right examples highlight the ``n_labels`` parameter:\nmore of the samples in the right plot have 2 or 3 labels.\n\nNote that this two-dimensional example is very degenerate:\ngenerally the number of features would be much greater than the\n\"document length\", while here we have much larger documents than vocabulary.\nSimilarly, with ``n_classes > n_features``, it is much less likely that a\nfeature distinguishes a particular class.\n\n"
+        "\n# Plot randomly generated multilabel dataset\n\n\nThis illustrates the `datasets.make_multilabel_classification` dataset\ngenerator. Each sample consists of counts of two features (up to 50 in\ntotal), which are differently distributed in each of two classes.\n\nPoints are labeled as follows, where Y means the class is present:\n\n    =====  =====  =====  ======\n      1      2      3    Color\n    =====  =====  =====  ======\n      Y      N      N    Red\n      N      Y      N    Blue\n      N      N      Y    Yellow\n      Y      Y      N    Purple\n      Y      N      Y    Orange\n      Y      Y      N    Green\n      Y      Y      Y    Brown\n    =====  =====  =====  ======\n\nA star marks the expected sample for each class; its size reflects the\nprobability of selecting that class label.\n\nThe left and right examples highlight the ``n_labels`` parameter:\nmore of the samples in the right plot have 2 or 3 labels.\n\nNote that this two-dimensional example is very degenerate:\ngenerally the number of features would be much greater than the\n\"document length\", while here we have much larger documents than vocabulary.\nSimilarly, with ``n_classes > n_features``, it is much less likely that a\nfeature distinguishes a particular class.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n=================================================\nSVM-Anova: SVM with univariate feature selection\n=================================================\n\nThis example shows how to perform univariate feature selection before running a\nSVC (support vector classifier) to improve the classification scores. We use\nthe iris dataset (4 features) and add 36 non-informative features. We can find\nthat our model achieves best performance when we select around 10% of features.\n\n"
+        "\n=================================================\nSVM-Anova: SVM with univariate feature selection\n=================================================\n\nThis example shows how to perform univariate feature selection before running a\nSVC (support vector classifier) to improve the classification scores. We use\nthe iris dataset (4 features) and add 36 non-informative features. We can find\nthat our model achieves best performance when we select around 10% of features.\n"
       ]
     },
     {
 
@@ -15,7 +15,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Various Agglomerative Clustering on a 2D embedding of digits\n\n\nAn illustration of various linkage option for agglomerative clustering on\na 2D embedding of the digits dataset.\n\nThe goal of this example is to show intuitively how the metrics behave, and\nnot to find good clusters for the digits. This is why the example works on a\n2D embedding.\n\nWhat this example shows us is the behavior \"rich getting richer\" of\nagglomerative clustering that tends to create uneven cluster sizes.\nThis behavior is pronounced for the average linkage strategy,\nthat ends up with a couple of singleton clusters, while in the case\nof single linkage we get a single central cluster with all other clusters\nbeing drawn from noise points around the fringes.\n\n"
+        "\n# Various Agglomerative Clustering on a 2D embedding of digits\n\n\nAn illustration of various linkage option for agglomerative clustering on\na 2D embedding of the digits dataset.\n\nThe goal of this example is to show intuitively how the metrics behave, and\nnot to find good clusters for the digits. This is why the example works on a\n2D embedding.\n\nWhat this example shows us is the behavior \"rich getting richer\" of\nagglomerative clustering that tends to create uneven cluster sizes.\nThis behavior is pronounced for the average linkage strategy,\nthat ends up with a couple of singleton clusters, while in the case\nof single linkage we get a single central cluster with all other clusters\nbeing drawn from noise points around the fringes.\n"
       ]
     },
     {
Original file line number	Diff line number	Diff line change
`@@ -15,7 +15,7 @@`
`15`	`15`	`"cell_type": "markdown",`
`16`	`16`	`"metadata": {},`
`17`	`17`	`"source": [`
`18`		`- "\n# Lasso path using LARS\n\n\nComputes Lasso Path along the regularization parameter using the LARS\nalgorithm on the diabetes dataset. Each color represents a different\nfeature of the coefficient vector, and this is displayed as a function\nof the regularization parameter.\n\n\n"`
	`18`	`+ "\n# Lasso path using LARS\n\n\nComputes Lasso Path along the regularization parameter using the LARS\nalgorithm on the diabetes dataset. Each color represents a different\nfeature of the coefficient vector, and this is displayed as a function\nof the regularization parameter.\n"`
`19`	`19`	`]`
`20`	`20`	`},`
`21`	`21`	`{`