codevig
diff --git a/‎doc/datasets/index.rst‎
Lines changed: 6 additions & 6 deletions b/‎doc/datasets/index.rst‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎doc/datasets/rcv1.rst‎
Lines changed: 2 additions & 2 deletions b/‎doc/datasets/rcv1.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎doc/modules/decomposition.rst‎
Lines changed: 1 addition & 0 deletions b/‎doc/modules/decomposition.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/modules/feature_selection.rst‎
Lines changed: 2 additions & 0 deletions b/‎doc/modules/feature_selection.rst‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎doc/modules/gaussian_process.rst‎
Lines changed: 33 additions & 26 deletions b/‎doc/modules/gaussian_process.rst‎
Lines changed: 33 additions & 26 deletions
diff --git a/‎doc/modules/multiclass.rst‎
Lines changed: 4 additions & 4 deletions b/‎doc/modules/multiclass.rst‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎doc/modules/neural_networks_supervised.rst‎
Lines changed: 3 additions & 3 deletions b/‎doc/modules/neural_networks_supervised.rst‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎doc/modules/outlier_detection.rst‎
Lines changed: 1 addition & 0 deletions b/‎doc/modules/outlier_detection.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/whats_new.rst‎
Lines changed: 9 additions & 9 deletions b/‎doc/whats_new.rst‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎examples/applications/face_recognition.py‎
Lines changed: 3 additions & 1 deletion b/‎examples/applications/face_recognition.py‎
Lines changed: 3 additions & 1 deletion
@@ -267,26 +267,26 @@ features::
 
 .. include:: rcv1.rst
 
-.. _boston_house_prices
+.. _boston_house_prices:
 
 .. include:: ../../sklearn/datasets/descr/boston_house_prices.rst
 
-.. _breast_cancer
+.. _breast_cancer:
 
 .. include:: ../../sklearn/datasets/descr/breast_cancer.rst
 
-.. _diabetes
+.. _diabetes:
 
 .. include:: ../../sklearn/datasets/descr/diabetes.rst
 
-.. _digits
+.. _digits:
 
 .. include:: ../../sklearn/datasets/descr/digits.rst
 
-.. _iris
+.. _iris:
 
 .. include:: ../../sklearn/datasets/descr/iris.rst
 
-.. _linnerud
+.. _linnerud:
 
 .. include:: ../../sklearn/datasets/descr/linnerud.rst
@@ -41,10 +41,10 @@ There are 103 topics, each represented by a string. Their corpus frequencies spa
     >>> rcv1.target_names[:3].tolist()  # doctest: +SKIP
     ['E11', 'ECAT', 'M11']
 
-The dataset will be downloaded from the `dataset's homepage`_ if necessary.
+The dataset will be downloaded from the `rcv1 homepage`_ if necessary.
 The compressed size is about 656 MB.
 
-.. _dataset's homepage: http://jmlr.csail.mit.edu/papers/volume5/lewis04a/
+.. _rcv1 homepage: http://jmlr.csail.mit.edu/papers/volume5/lewis04a/
 
 
 .. topic:: References
 
@@ -776,6 +776,7 @@ a corpus with :math:`D` documents and :math:`K` topics:
   2. For each document :math:`d`, draw :math:`\theta_d \sim Dirichlet(\alpha), \: d=1...D`
 
   3. For each word :math:`i` in document :math:`d`:
+
     a. Draw a topic index :math:`z_{di} \sim Multinomial(\theta_d)`
     b. Draw the observed word :math:`w_{ij} \sim Multinomial(beta_{z_{di}}.)`
 
 
@@ -153,6 +153,8 @@ For examples on how it is to be used refer to the sections below.
       most important features from the Boston dataset without knowing the
       threshold beforehand.
 
+.. _l1_feature_selection:
+
 L1-based feature selection
 --------------------------
 
 
@@ -67,12 +67,15 @@ level from the data (see example below).
 
 The implementation is based on Algorithm 2.1 of [RW2006]_. In addition to
 the API of standard sklearn estimators, GaussianProcessRegressor:
-     * allows prediction without prior fitting (based on the GP prior)
-     * provides an additional method ``sample_y(X)``, which evaluates samples
-       drawn from the GPR (prior or posterior) at given inputs
-     * exposes a method ``log_marginal_likelihood(theta)``, which can be used
-       externally for other ways of selecting hyperparameters, e.g., via
-       Markov chain Monte Carlo.
+
+* allows prediction without prior fitting (based on the GP prior)
+
+* provides an additional method ``sample_y(X)``, which evaluates samples
+  drawn from the GPR (prior or posterior) at given inputs
+
+* exposes a method ``log_marginal_likelihood(theta)``, which can be used
+  externally for other ways of selecting hyperparameters, e.g., via
+  Markov chain Monte Carlo.
 
 
 GPR examples
@@ -171,26 +174,30 @@ model the CO2 concentration as a function of the time t.
 
 The kernel is composed of several terms that are responsible for explaining
 different properties of the signal:
- - a long term, smooth rising trend is to be explained by an RBF kernel. The
-   RBF kernel with a large length-scale enforces this component to be smooth;
-   it is not enforced that the trend is rising which leaves this choice to the
-   GP. The specific length-scale and the amplitude are free hyperparameters.
- - a seasonal component, which is to be explained by the periodic
-   ExpSineSquared kernel with a fixed periodicity of 1 year. The length-scale
-   of this periodic component, controlling its smoothness, is a free parameter.
-   In order to allow decaying away from exact periodicity, the product with an
-   RBF kernel is taken. The length-scale of this RBF component controls the
-   decay time and is a further free parameter.
- - smaller, medium term irregularities are to be explained by a
-   RationalQuadratic kernel component, whose length-scale and alpha parameter,
-   which determines the diffuseness of the length-scales, are to be determined.
-   According to [RW2006]_, these irregularities can better be explained by
-   a RationalQuadratic than an RBF kernel component, probably because it can
-   accommodate several length-scales.
- - a "noise" term, consisting of an RBF kernel contribution, which shall
-   explain the correlated noise components such as local weather phenomena,
-   and a WhiteKernel contribution for the white noise. The relative amplitudes
-   and the RBF's length scale are further free parameters.
+
+- a long term, smooth rising trend is to be explained by an RBF kernel. The
+  RBF kernel with a large length-scale enforces this component to be smooth;
+  it is not enforced that the trend is rising which leaves this choice to the
+  GP. The specific length-scale and the amplitude are free hyperparameters.
+
+- a seasonal component, which is to be explained by the periodic
+  ExpSineSquared kernel with a fixed periodicity of 1 year. The length-scale
+  of this periodic component, controlling its smoothness, is a free parameter.
+  In order to allow decaying away from exact periodicity, the product with an
+  RBF kernel is taken. The length-scale of this RBF component controls the
+  decay time and is a further free parameter.
+
+- smaller, medium term irregularities are to be explained by a
+  RationalQuadratic kernel component, whose length-scale and alpha parameter,
+  which determines the diffuseness of the length-scales, are to be determined.
+  According to [RW2006]_, these irregularities can better be explained by
+  a RationalQuadratic than an RBF kernel component, probably because it can
+  accommodate several length-scales.
+
+- a "noise" term, consisting of an RBF kernel contribution, which shall
+  explain the correlated noise components such as local weather phenomena,
+  and a WhiteKernel contribution for the white noise. The relative amplitudes
+  and the RBF's length scale are further free parameters.
 
 Maximizing the log-marginal-likelihood after subtracting the target's mean
 yields the following kernel with an LML of -83.214:
 
@@ -215,7 +215,7 @@ code book. The code size is the dimensionality of the aforementioned space.
 Intuitively, each class should be represented by a code as unique as
 possible and a good code book should be designed to optimize classification
 accuracy. In this implementation, we simply use a randomly-generated code
-book as advocated in [2]_ although more elaborate methods may be added in the
+book as advocated in [3]_ although more elaborate methods may be added in the
 future.
 
 At fitting time, one binary classifier per bit in the code book is fitted.
@@ -262,16 +262,16 @@ Below is an example of multiclass learning using Output-Codes::
 
 .. topic:: References:
 
-    .. [1] "Solving multiclass learning problems via error-correcting output codes",
+    .. [2] "Solving multiclass learning problems via error-correcting output codes",
         Dietterich T., Bakiri G.,
         Journal of Artificial Intelligence Research 2,
         1995.
 
-    .. [2] "The error coding method and PICTs",
+    .. [3] "The error coding method and PICTs",
         James G., Hastie T.,
         Journal of Computational and Graphical statistics 7,
         1998.
 
-    .. [3] "The Elements of Statistical Learning",
+    .. [4] "The Elements of Statistical Learning",
         Hastie T., Tibshirani R., Friedman J., page 606 (second-edition)
         2008.
@@ -153,7 +153,7 @@ See the examples below and the doc string of
 
 .. topic:: Examples:
 
- * :ref:`example_plot_mlp_alpha.py`
+ * :ref:`example_neural_networks_plot_mlp_alpha.py`
 
 
 Regression
@@ -175,7 +175,7 @@ Algorithms
 MLP trains using `Stochastic Gradient Descent
 <http://en.wikipedia.org/wiki/Stochastic_gradient_descent>`_,
 `Adam <http://arxiv.org/abs/1412.6980>`_, or
-`L-BFGS <http://en.wikipedia.org/wiki/Limited-memory_BFGS>`_.
+`L-BFGS <http://en.wikipedia.org/wiki/Limited-memory_BFGS>`__.
 Stochastic Gradient Descent (SGD) updates parameters using the gradient of the
 loss function with respect to a parameter that needs adaptation, i.e.
 
@@ -201,7 +201,7 @@ L-BFGS is a fast learning algorithm that approximates the Hessian matrix which
 represents the second-order partial derivative of a function. Further it
 approximates the inverse of the Hessian matrix to perform parameter updates.
 The implementation uses the Scipy version of
-`L-BFGS <http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html>`_..
+`L-BFGS <http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html>`__..
 
 If the selected algorithm is 'L-BFGS', training does not support online nor
 mini-batch learning.
 
@@ -169,6 +169,7 @@ This strategy is illustrated below.
      :class:`covariance.MinCovDet`.
 
 .. topic:: References:
+
     .. [LTZ2008] Liu, Fei Tony, Ting, Kai Ming and Zhou, Zhi-Hua. "Isolation forest."
            Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on.
 
 
@@ -33,7 +33,7 @@ Enhancements
      that takes in the data and yields a generator for the different splits.
      This change makes it possible to do nested cross-validation with ease,
      facilitated by :class:`model_selection.GridSearchCV` and similar
-     utilities.  (`#4294 https://github.com/scikit-learn/scikit-learn/pull/4294>`_) by `Raghav R V`_.
+     utilities.  (`#4294 <https://github.com/scikit-learn/scikit-learn/pull/4294>`_) by `Raghav R V`_.
 
    - The random forest, extra trees and decision tree estimators now has a
      method ``decision_path`` which returns the decision path of samples in
@@ -56,16 +56,16 @@ Bug fixes
 .........
 
     - :class:`RandomizedPCA` default number of `iterated_power` is 2 instead of 3.
-      This is a speed up with a minor precision decrease. (`#5141 https://github.com/scikit-learn/scikit-learn/pull/5141>`_) by `Giorgio Patrini`_.
+      This is a speed up with a minor precision decrease. (`#5141 <https://github.com/scikit-learn/scikit-learn/pull/5141>`_) by `Giorgio Patrini`_.
 
     - :func:`randomized_svd` performs 2 power iterations by default, instead or 0.
       In practice this is often enough for obtaining a good approximation of the
-      true eigenvalues/vectors in the presence of noise. (`#5141 https://github.com/scikit-learn/scikit-learn/pull/5141>`_) by `Giorgio Patrini`_.
+      true eigenvalues/vectors in the presence of noise. (`#5141 <https://github.com/scikit-learn/scikit-learn/pull/5141>`_) by `Giorgio Patrini`_.
 
     - :func:`randomized_range_finder` is more numerically stable when many
       power iterations are requested, since it applies LU normalization by default.
       If `n_iter<2` numerical issues are unlikely, thus no normalization is applied.
-      Other normalization options are available: 'none', 'LU' and 'QR'. (`#5141 https://github.com/scikit-learn/scikit-learn/pull/5141>`_) by `Giorgio Patrini`_.
+      Other normalization options are available: 'none', 'LU' and 'QR'. (`#5141 <https://github.com/scikit-learn/scikit-learn/pull/5141>`_) by `Giorgio Patrini`_.
 
     - Fixed bug in :func:`manifold.spectral_embedding` where diagonal of unnormalized
       Laplacian matrix was incorrectly set to 1. By `Peter Fischer`_.
@@ -85,7 +85,7 @@ API changes summary
    - The :mod:`cross_validation`, :mod:`grid_search` and :mod:`learning_curve`
      have been deprecated and the classes and functions have been reorganized into
      the :mod:`model_selection` module.
-     (`#4294 https://github.com/scikit-learn/scikit-learn/pull/4294>`_) by `Raghav R V`_.
+     (`#4294 <https://github.com/scikit-learn/scikit-learn/pull/4294>`_) by `Raghav R V`_.
 
 
 .. _changes_0_17:
@@ -366,7 +366,7 @@ Bug fixes
 
     - Fixed bug in :class:`cross_decomposition.PLS` that yielded unstable and
       platform dependent output, and failed on `fit_transform`.
-       By `Arthur Mensch`_.
+      By `Arthur Mensch`_.
 
     - Fixed a bug in :class:`linear_model.LogisticRegression` and
       :class:`linear_model.LogisticRegressionCV` when using
@@ -3403,8 +3403,8 @@ Changelog
 
   - New :ref:`gaussian_process` module by Vincent Dubourg. This module
     also has great documentation and some very neat examples. See
-    :ref:`example_gaussian_process_plot_gp_regression.py` or
-    :ref:`example_gaussian_process_plot_gp_probabilistic_classification_after_regression.py`
+    example_gaussian_process_plot_gp_regression.py or
+    example_gaussian_process_plot_gp_probabilistic_classification_after_regression.py
     for a taste of what can be done.
 
   - It is now possible to use liblinear’s Multi-class SVC (option
@@ -3866,4 +3866,4 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
 .. _Graham Clenaghan: https://github.com/gclenaghan
 .. _Giorgio Patrini: https://github.com/giorgiop
 .. _Elvis Dohmatob: https://github.com/dohmatob
-.. _yelite https://github.com/yelite
+.. _yelite: https://github.com/yelite
@@ -12,8 +12,9 @@
 
 Expected results for the top 5 most represented people in the dataset::
 
+================== ============ ======= ========== =======
                    precision    recall  f1-score   support
-
+================== ============ ======= ========== =======
      Ariel Sharon       0.67      0.92      0.77        13
      Colin Powell       0.75      0.78      0.76        60
   Donald Rumsfeld       0.78      0.67      0.72        27
@@ -23,6 +24,7 @@
        Tony Blair       0.81      0.69      0.75        36
 
       avg / total       0.80      0.80      0.80       322
+================== ============ ======= ========== =======
 
 """
 from __future__ import print_function