asawq2006
diff --git a/‎doc/modules/classes.rst‎
Lines changed: 5 additions & 22 deletions b/‎doc/modules/classes.rst‎
Lines changed: 5 additions & 22 deletions
diff --git a/‎doc/modules/lda_qda.rst‎
Lines changed: 85 additions & 64 deletions b/‎doc/modules/lda_qda.rst‎
Lines changed: 85 additions & 64 deletions
diff --git a/‎doc/modules/multiclass.rst‎
Lines changed: 2 additions & 2 deletions b/‎doc/modules/multiclass.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎doc/modules/neighbors.rst‎
Lines changed: 3 additions & 3 deletions b/‎doc/modules/neighbors.rst‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎doc/whats_new.rst‎
Lines changed: 9 additions & 9 deletions b/‎doc/whats_new.rst‎
Lines changed: 9 additions & 9 deletions
diff --git a/‎examples/classification/plot_classifier_comparison.py‎
Lines changed: 6 additions & 5 deletions b/‎examples/classification/plot_classifier_comparison.py‎
Lines changed: 6 additions & 5 deletions
@@ -603,10 +603,10 @@ From text
 
 .. _lda_ref:
 
-:mod:`sklearn.lda`: Linear Discriminant Analysis
-================================================
+:mod:`sklearn.discriminant_analysis`: Discriminant Analysis
+===========================================================
 
-.. automodule:: sklearn.lda
+.. automodule:: sklearn.discriminant_analysis
    :no-members:
    :no-inherited-members:
 
@@ -618,7 +618,8 @@ From text
    :toctree: generated
    :template: class.rst
 
-   lda.LDA
+   discriminant_analysis.LinearDiscriminantAnalysis
+   discriminant_analysis.QuadraticDiscriminantAnalysis
 
 
 .. _learning_curve_ref:
@@ -1136,24 +1137,6 @@ See the :ref:`metrics` section of the user guide for further details.
    preprocessing.scale
 
 
-
-:mod:`sklearn.qda`: Quadratic Discriminant Analysis
-===================================================
-
-.. automodule:: sklearn.qda
-   :no-members:
-   :no-inherited-members:
-
-**User guide:** See the :ref:`lda_qda` section for further details.
-
-.. currentmodule:: sklearn
-
-.. autosummary::
-   :toctree: generated
-   :template: class.rst
-
-   qda.QDA
-
 .. _random_projection_ref:
 
 :mod:`sklearn.random_projection`: Random projection
 
@@ -1,112 +1,131 @@
 .. _lda_qda:
 
 ==========================================
-Linear and quadratic discriminant analysis
+Linear and Quadratic Discriminant Analysis
 ==========================================
 
 .. currentmodule:: sklearn
 
-Linear discriminant analysis (:class:`lda.LDA`) and
-quadratic discriminant analysis (:class:`qda.QDA`)
-are two standard classifiers, with, as their names suggest, a linear and a
-quadratic decision surface, respectively.
+Linear Discriminant Analysis
+(:class:`discriminant_analysis.LinearDiscriminantAnalysis`) and Quadratic
+Discriminant Analysis
+(:class:`discriminant_analysis.QuadraticDiscriminantAnalysis`) are two classic
+classifiers, with, as their names suggest, a linear and a quadratic decision
+surface, respectively.
 
 These classifiers are attractive because they have closed-form solutions that
-can be easily computed, are inherently multiclass, have proven to work well in practice and have
-no hyperparameters to tune.
+can be easily computed, are inherently multiclass, have proven to work well in
+practice and have no hyperparameters to tune.
 
 .. |ldaqda| image:: ../auto_examples/classification/images/plot_lda_qda_001.png
         :target: ../auto_examples/classification/plot_lda_qda.html
         :scale: 80
 
 .. centered:: |ldaqda|
 
-The plot shows decision boundaries for LDA and QDA. The first row shows that,
-when the classes covariances are the same, LDA and QDA yield the same result 
-(up to a small difference resulting from the implementation). The bottom row demonstrates that in general, 
-LDA can only learn linear boundaries, while QDA can learn
-quadratic boundaries and is therefore more flexible.
+The plot shows decision boundaries for Linear Discriminant Analysis and
+Quadratic Discriminant Analysis. The bottom row demonstrates that Linear
+Discriminant Analysis can only learn linear boundaries, while Quadratic
+Discriminant Analysis can learn quadratic boundaries and is therefore more
+flexible.
 
 .. topic:: Examples:
 
-    :ref:`example_classification_plot_lda_qda.py`: Comparison of LDA and QDA on synthetic data.
+    :ref:`example_classification_plot_lda_qda.py`: Comparison of LDA and QDA
+    on synthetic data.
 
-Dimensionality reduction using LDA
-==================================
-
-:class:`lda.LDA` can be used to perform supervised dimensionality reduction, by
-projecting the input data to a linear subspace consisting of the directions which maximize the
-separation between classes (in a precise sense discussed in the mathematics section below). 
-The dimension of the output is necessarily less that the number of classes, 
-so this is a in general a rather strong dimensionality reduction, and only makes senses 
-in a multiclass setting.
+Dimensionality reduction using Linear Discriminant Analysis
+===========================================================
 
-This is implemented in :func:`lda.LDA.transform`. The desired
-dimensionality can be set using the ``n_components`` constructor
-parameter. This parameter has no influence on :func:`lda.LDA.fit` or :func:`lda.LDA.predict`.
+:class:`discriminant_analysis.LinearDiscriminantAnalysis` can be used to
+perform supervised dimensionality reduction, by projecting the input data to a
+linear subspace consisting of the directions which maximize the separation
+between classes (in a precise sense discussed in the mathematics section
+below). The dimension of the output is necessarily less that the number of
+classes, so this is a in general a rather strong dimensionality reduction, and
+only makes senses in a multiclass setting.
+
+This is implemented in
+:func:`discriminant_analysis.LinearDiscriminantAnalysis.transform`. The desired
+dimensionality can be set using the ``n_components`` constructor parameter.
+This parameter has no influence on
+:func:`discriminant_analysis.LinearDiscriminantAnalysis.fit` or
+:func:`discriminant_analysis.LinearDiscriminantAnalysis.predict`.
 
 .. topic:: Examples:
 
-    :ref:`example_decomposition_plot_pca_vs_lda.py`: Comparison of LDA and PCA for dimensionality reduction of the Iris dataset
+    :ref:`example_decomposition_plot_pca_vs_lda.py`: Comparison of LDA and PCA
+    for dimensionality reduction of the Iris dataset
 
 Mathematical formulation of the LDA and QDA classifiers
 =======================================================
 
-Both LDA and QDA can be derived from simple probabilistic models 
-which model the class conditional distribution of the data :math:`P(X|y=k)`
-for each class :math:`k`. Predictions can then be obtained by using Bayes' rule:
+Both LDA and QDA can be derived from simple probabilistic models which model
+the class conditional distribution of the data :math:`P(X|y=k)` for each class
+:math:`k`. Predictions can then be obtained by using Bayes' rule:
 
 .. math::
     P(y=k | X) = \frac{P(X | y=k) P(y=k)}{P(X)} = \frac{P(X | y=k) P(y = k)}{ \sum_{l} P(X | y=l) \cdot P(y=l)}
 
 and we select the class :math:`k` which maximizes this conditional probability.
 
-More specifically, for linear and quadratic discriminant analysis, :math:`P(X|y)`
-is modelled as a multivariate Gaussian distribution with density:
+More specifically, for linear and quadratic discriminant analysis,
+:math:`P(X|y)` is modelled as a multivariate Gaussian distribution with
+density:
 
 .. math:: p(X | y=k) = \frac{1}{(2\pi)^n |\Sigma_k|^{1/2}}\exp\left(-\frac{1}{2} (X-\mu_k)^t \Sigma_k^{-1} (X-\mu_k)\right)
 
-To use this model as a classifier, we just need to estimate from the training data
-the class priors :math:`P(y=k)` (by the proportion of instances of class :math:`k`), the
-class means :math:`\mu_k` (by the empirical sample class means) and the covariance matrices 
-(either by the empirical sample class covariance matrices, or by a regularized estimator: see the section on shrinkage below).
+To use this model as a classifier, we just need to estimate from the training
+data the class priors :math:`P(y=k)` (by the proportion of instances of class
+:math:`k`), the class means :math:`\mu_k` (by the empirical sample class means)
+and the covariance matrices (either by the empirical sample class covariance
+matrices, or by a regularized estimator: see the section on shrinkage below).
 
-In the case of LDA, the Gaussians for each class are assumed 
-to share the same covariance matrix: :math:`\Sigma_k = \Sigma` for all :math:`k`.
-This leads to linear decision surfaces between, as can be seen by comparing the the log-probability ratios
-:math:`\log[P(y=k | X) / P(y=l | X)]`:
+In the case of LDA, the Gaussians for each class are assumed to share the same
+covariance matrix: :math:`\Sigma_k = \Sigma` for all :math:`k`. This leads to
+linear decision surfaces between, as can be seen by comparing the the
+log-probability ratios :math:`\log[P(y=k | X) / P(y=l | X)]`:
 
 .. math::
    \log\left(\frac{P(y=k|X)}{P(y=l | X)}\right) = 0 \Leftrightarrow (\mu_k-\mu_l)\Sigma^{-1} X = \frac{1}{2} (\mu_k^t \Sigma^{-1} \mu_k - \mu_l^t \Sigma^{-1} \mu_l)
 
-In the case of QDA, there are no assumptions on the covariance matrices :math:`\Sigma_k` of the Gaussians,
-leading to quadratic decision surfaces. See [#1]_ for more details.
+In the case of QDA, there are no assumptions on the covariance matrices
+:math:`\Sigma_k` of the Gaussians, leading to quadratic decision surfaces. See
+[#1]_ for more details.
 
 .. note:: **Relation with Gaussian Naive Bayes**
 
-	  If in the QDA model one assumes that the covariance matrices are diagonal, then
-	  this means that we assume the classes are conditionally independent,
-	  and the resulting classifier is equivalent to the Gaussian Naive Bayes classifier :class:`GaussianNB`.
+	  If in the QDA model one assumes that the covariance matrices are diagonal,
+	  then this means that we assume the classes are conditionally independent,
+	  and the resulting classifier is equivalent to the Gaussian Naive Bayes
+	  classifier :class:`naive_bayes.GaussianNB`.
 
 Mathematical formulation of LDA dimensionality reduction
-===========================================================
+========================================================
 
 To understand the use of LDA in dimensionality reduction, it is useful to start
 with a geometric reformulation of the LDA classification rule explained above.
-We write :math:`K` for the total number of target classes.
-Since in LDA we assume that all classes have the same estimated covariance :math:`\Sigma`, we can rescale the 
-data so that this covariance is the identity:
+We write :math:`K` for the total number of target classes. Since in LDA we
+assume that all classes have the same estimated covariance :math:`\Sigma`, we
+can rescale the data so that this covariance is the identity:
 
 .. math:: X^* = D^{-1/2}U^t X\text{ with }\Sigma = UDU^t
 
-Then one can show that to classify a data point after scaling is equivalent to finding the estimated class mean :math:`\mu^*_k` which is 
-closest to the data point in the Euclidean distance. But this can be done just as well after projecting on the :math:`K-1` affine subspace :math:`H_K`
-generated by all the :math:`\mu^*_k` for all classes. This shows that, implicit in the LDA classifier, there is
-a dimensionality reduction by linear projection onto a :math:`K-1` dimensional space.
-
-We can reduce the dimension even more, to a chosen :math:`L`, by projecting onto the linear subspace :math:`H_L` which
-maximize the variance of the :math:`\mu^*_k` after projection (in effect, we are doing a form of PCA for the transformed class means :math:`\mu^*_k`).
-This :math:`L` corresponds to the ``n_components`` parameter in the :func:`lda.LDA.transform` method. See [#1]_ for more details.
+Then one can show that to classify a data point after scaling is equivalent to
+finding the estimated class mean :math:`\mu^*_k` which is closest to the data
+point in the Euclidean distance. But this can be done just as well after
+projecting on the :math:`K-1` affine subspace :math:`H_K` generated by all the
+:math:`\mu^*_k` for all classes. This shows that, implicit in the LDA
+classifier, there is a dimensionality reduction by linear projection onto a
+:math:`K-1` dimensional space.
+
+We can reduce the dimension even more, to a chosen :math:`L`, by projecting
+onto the linear subspace :math:`H_L` which maximize the variance of the
+:math:`\mu^*_k` after projection (in effect, we are doing a form of PCA for the
+transformed class means :math:`\mu^*_k`). This :math:`L` corresponds to the
+``n_components`` parameter used in the
+:func:`discriminant_analysis.LinearDiscriminantAnalysis.transform` method. See
+[#1]_ for more details.
 
 Shrinkage
 =========
@@ -115,10 +134,11 @@ Shrinkage is a tool to improve estimation of covariance matrices in situations
 where the number of training samples is small compared to the number of
 features. In this scenario, the empirical sample covariance is a poor
 estimator. Shrinkage LDA can be used by setting the ``shrinkage`` parameter of
-the :class:`lda.LDA` class to 'auto'. This automatically determines the
-optimal shrinkage parameter in an analytic way following the lemma introduced
-by Ledoit and Wolf [#2]_. Note that currently shrinkage only works when setting the
-``solver`` parameter to 'lsqr' or 'eigen'.
+the :class:`discriminant_analysis.LinearDiscriminantAnalysis` class to 'auto'.
+This automatically determines the optimal shrinkage parameter in an analytic
+way following the lemma introduced by Ledoit and Wolf [#2]_. Note that
+currently shrinkage only works when setting the ``solver`` parameter to 'lsqr'
+or 'eigen'.
 
 The ``shrinkage`` parameter can also be manually set between 0 and 1. In
 particular, a value of 0 corresponds to no shrinkage (which means the empirical
@@ -154,12 +174,13 @@ a high number of features.
 
 .. topic:: Examples:
 
-    :ref:`example_classification_plot_lda.py`: Comparison of LDA classifiers with and without shrinkage.
+    :ref:`example_classification_plot_lda.py`: Comparison of LDA classifiers
+    with and without shrinkage.
 
 .. topic:: References:
 
    .. [#1] "The Elements of Statistical Learning", Hastie T., Tibshirani R.,
-        Friedman J., Section 4.3, p.106-119, 2008.
+      Friedman J., Section 4.3, p.106-119, 2008.
 
-   .. [#2] Ledoit O, Wolf M. Honey, I Shrunk the Sample Covariance Matrix. The Journal of Portfolio
-    Management 30(4), 110-119, 2004.
+   .. [#2] Ledoit O, Wolf M. Honey, I Shrunk the Sample Covariance Matrix.
+      The Journal of Portfolio Management 30(4), 110-119, 2004.
@@ -33,7 +33,7 @@ by decomposing such problems into binary classification problems.
     several joint classification tasks. This is a generalization
     of the multi-label classification task, where the set of classification
     problem is restricted to binary classification, and of the multi-class
-    classification task. *The output format is a 2d numpy array or sparse 
+    classification task. *The output format is a 2d numpy array or sparse
     matrix.*
 
     The set of labels can be different for each output variable.
@@ -65,7 +65,7 @@ if you're using one of these unless you want custom multiclass behavior:
     :ref:`Nearest Neighbors <neighbors>`,
     setting ``multi_class='multinomial'`` in
     :class:`sklearn.linear_model.LogisticRegression`.
-  - Support multilabel: :ref:`Decision Trees <tree>`, 
+  - Support multilabel: :ref:`Decision Trees <tree>`,
     :ref:`Random Forests <forest>`, :ref:`Nearest Neighbors <neighbors>`,
     :ref:`Ridge Regression <ridge_regression>`.
   - One-Vs-One: :class:`sklearn.svm.SVC`.
 
@@ -467,9 +467,9 @@ similar to the label updating phase of the :class:`sklearn.KMeans` algorithm.
 It also has no parameters to choose, making it a good baseline classifier. It
 does, however, suffer on non-convex classes, as well as when classes have
 drastically different variances, as equal variance in all dimensions is
-assumed. See Linear Discriminant Analysis (:class:`sklearn.lda.LDA`) and
-Quadratic Discriminant Analysis (:class:`sklearn.qda.QDA`) for more complex
-methods that do not make this assumption. Usage of the default
+assumed. See Linear Discriminant Analysis (:class:`sklearn.discriminant_analysis.LinearDiscriminantAnanlysi`)
+and Quadratic Discriminant Analysis (:class:`sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis`)
+for more complex methods that do not make this assumption. Usage of the default
 :class:`NearestCentroid` is simple:
 
     >>> from sklearn.neighbors.nearest_centroid import NearestCentroid
 
@@ -350,8 +350,8 @@ New features
    - Add :class:`cluster.Birch`, an online clustering algorithm. By
      `Manoj Kumar`_, `Alexandre Gramfort`_ and `Joel Nothman`_.
 
-   - Added shrinkage support to :class:`lda.LDA` using two new solvers. By
-     `Clemens Brunner`_ and `Martin Billinger`_.
+   - Added shrinkage support to :class:`discriminant_analysis.LinearDiscriminantAnalysis`
+     using two new solvers. By `Clemens Brunner`_ and `Martin Billinger`_.
 
    - Added :class:`kernel_ridge.KernelRidge`, an implementation of
      kernelized ridge regression.
@@ -758,8 +758,8 @@ Bug fixes
   - Explicitly close open files to avoid ``ResourceWarnings`` under Python 3.
     By Calvin Giles.
 
-  - The ``transform`` of :class:`lda.LDA` now projects the input on the most
-    discriminant directions. By Martin Billinger.
+  - The ``transform`` of :class:`discriminant_analysis.LinearDiscriminantAnalysis`
+    now projects the input on the most discriminant directions. By Martin Billinger.
 
   - Fixed potential overflow in ``_tree.safe_realloc`` by `Lars Buitinck`_.
 
@@ -2266,9 +2266,9 @@ API changes summary
    - Fixed API inconsistency: :meth:`linear_model.SGDClassifier.predict_proba` now
      returns 2d array when fit on two classes.
 
-   - Fixed API inconsistency: :meth:`qda.QDA.decision_function` and
-     :meth:`lda.LDA.decision_function` now return 1d arrays when fit on two
-     classes.
+   - Fixed API inconsistency: :meth:`discriminant_analysis.QuadraticDiscriminantAnalysis.decision_function`
+     and :meth:`discriminant_analysis.LinearDiscriminantAnalysis.decision_function` now return 1d arrays
+     when fit on two classes.
 
    - Grid of alphas used for fitting :class:`linear_model.LassoCV` and
      :class:`linear_model.ElasticNetCV` is now stored
@@ -3053,8 +3053,8 @@ Some other modules benefited from significant improvements or cleanups.
 
   - Add attribute converged to Gaussian Mixture Models by Vincent Schut.
 
-  - Implemented ``transform``, ``predict_log_proba`` in :class:`lda.LDA`
-    By `Mathieu Blondel`_.
+  - Implemented ``transform``, ``predict_log_proba`` in
+    :class:`discriminant_analysis.LinearDiscriminantAnalysis` By `Mathieu Blondel`_.
 
   - Refactoring in the :ref:`svm` module and bug fixes by `Fabian Pedregosa`_,
     `Gael Varoquaux`_ and Amit Aides.
 
@@ -39,13 +39,14 @@
 from sklearn.tree import DecisionTreeClassifier
 from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
 from sklearn.naive_bayes import GaussianNB
-from sklearn.lda import LDA
-from sklearn.qda import QDA
+from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
+from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
 
 h = .02  # step size in the mesh
 
 names = ["Nearest Neighbors", "Linear SVM", "RBF SVM", "Decision Tree",
-         "Random Forest", "AdaBoost", "Naive Bayes", "LDA", "QDA"]
+         "Random Forest", "AdaBoost", "Naive Bayes", "Linear Discriminant Analysis",
+         "Quadratic Discriminant Analysis"]
 classifiers = [
     KNeighborsClassifier(3),
     SVC(kernel="linear", C=0.025),
@@ -54,8 +55,8 @@
     RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
     AdaBoostClassifier(),
     GaussianNB(),
-    LDA(),
-    QDA()]
+    LinearDiscriminantAnalysis(),
+    QuadraticDiscriminantAnalysis()]
 
 X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
                            random_state=1, n_clusters_per_class=1)