Skip to content

Commit ee5811b

Browse files
committed
DOC: fixed latex and formatting in SVM docs
1 parent 94215eb commit ee5811b

File tree

1 file changed

+88
-89
lines changed

1 file changed

+88
-89
lines changed

doc/modules/svm.rst

Lines changed: 88 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -67,15 +67,14 @@ slightly different sets of parameters and have different mathematical
6767
formulations (see section :ref:`svm_mathematical_formulation`). On the
6868
other hand, :class:`LinearSVC` is another implementation of Support
6969
Vector Classification for the case of a linear kernel. Note that
70-
:class:`LinearSVC` does not accept keyword 'kernel', as this is
70+
:class:`LinearSVC` does not accept keyword ``kernel``, as this is
7171
assumed to be linear. It also lacks some of the members of
72-
:class:`SVC` and :class:`NuSVC`, like support\_.
72+
:class:`SVC` and :class:`NuSVC`, like ``support_``.
7373

7474
As other classifiers, :class:`SVC`, :class:`NuSVC` and
75-
:class:`LinearSVC` take as input two arrays: an array X of size
76-
[n_samples, n_features] holding the training samples, and an array Y
77-
of integer values, size [n_samples], holding the class labels for the
78-
training samples::
75+
:class:`LinearSVC` take as input two arrays: an array X of size ``[n_samples,
76+
n_features]`` holding the training samples, and an array Y of integer values,
77+
size ``[n_samples]``, holding the class labels for the training samples::
7978

8079

8180
>>> from sklearn import svm
@@ -94,8 +93,8 @@ After being fitted, the model can then be used to predict new values::
9493

9594
SVMs decision function depends on some subset of the training data,
9695
called the support vectors. Some properties of these support vectors
97-
can be found in members `support_vectors_`, `support_` and
98-
`n_support`::
96+
can be found in members ``support_vectors_``, ``support_`` and
97+
``n_support``::
9998

10099
>>> # get support vectors
101100
>>> clf.support_vectors_
@@ -115,7 +114,7 @@ Multi-class classification
115114

116115
:class:`SVC` and :class:`NuSVC` implement the "one-against-one"
117116
approach (Knerr et al., 1990) for multi- class classification. If
118-
n_class is the number of classes, then n_class * (n_class - 1)/2
117+
``n_class`` is the number of classes, then ``n_class * (n_class - 1) / 2``
119118
classifiers are constructed and each one trains data from two classes::
120119

121120
>>> X = [[0], [1], [2], [3]]
@@ -147,7 +146,7 @@ the decision function.
147146

148147
Note that the :class:`LinearSVC` also implements an alternative multi-class
149148
strategy, the so-called multi-class SVM formulated by Crammer and Singer, by
150-
using the option "multi_class='crammer_singer'". This method is consistent,
149+
using the option ``multi_class='crammer_singer'``. This method is consistent,
151150
which is not true for one-vs-rest classification.
152151
In practice, on-vs-rest classification is usually preferred, since the results
153152
are mostly similar, but the runtime is significantly less.
@@ -161,9 +160,9 @@ order of the "one" class.
161160
In the case of "one-vs-one" :class:`SVC`, the layout of the attributes
162161
is a little more involved. In the case of having a linear kernel,
163162
The layout of ``coef_`` and ``intercept_`` is similar to the one
164-
described for :class:`LinearSVC` described above, except that
165-
the shape of ``coef_`` is ``[n_class * (n_class - 1) / 2``,
166-
corresponding to as many binary classifiers. The order for classes
163+
described for :class:`LinearSVC` described above, except that the shape of
164+
``coef_`` is ``[n_class * (n_class - 1) / 2, n_features]``, corresponding to as
165+
many binary classifiers. The order for classes
167166
0 to n is "0 vs 1", "0 vs 2" , ... "0 vs n", "1 vs 2", "1 vs 3", "1 vs n", . .
168167
. "n-1 vs n".
169168

@@ -177,13 +176,13 @@ for these classifiers.
177176

178177
This might be made more clear by an example:
179178

180-
Consider a three class problem with with class 0 having 3 support vectors
181-
:math:`v^{0}_0, v^{1}_0, v^{2}_0` and class 1 and 2 having two support
182-
vectors :math:`v^{0}_1, v^{1}_1` and :math:`v^{0}_1, v^{1}_1` respectively.
183-
For each support vector :math:`v^{j}_i`, there are 2 dual coefficients.
184-
Let's call the coefficient of support vector :math:`v^{j}_i` in the
185-
classifier between classes `i` and `k` :math:`\alpha^{j}_{i,k}`.
186-
Then ``dual_coef_`` looks like this:
179+
Consider a three class problem with with class 0 having three support vectors
180+
:math:`v^{0}_0, v^{1}_0, v^{2}_0` and class 1 and 2 having two support vectors
181+
:math:`v^{0}_1, v^{1}_1` and :math:`v^{0}_1, v^{1}_1` respectively. For each
182+
support vector :math:`v^{j}_i`, there are two dual coefficients. Let's call
183+
the coefficient of support vector :math:`v^{j}_i` in the classifier between
184+
classes `i` and `k` :math:`\alpha^{j}_{i,k}`. Then ``dual_coef_`` looks like
185+
this:
187186

188187
+------------------------+------------------------+------------------+
189188
|:math:`\alpha^{0}_{0,1}`|:math:`\alpha^{0}_{0,2}`|Coefficients |
@@ -210,9 +209,9 @@ classes or certain individual samples keywords ``class_weight`` and
210209
``sample_weight`` can be used.
211210

212211
:class:`SVC` (but not :class:`NuSVC`) implement a keyword
213-
``class_weight`` in the fit method. It's a dictionary of the form
212+
``class_weight`` in the ``fit`` method. It's a dictionary of the form
214213
``{class_label : value}``, where value is a floating point number > 0
215-
that sets the parameter C of class ``class_label`` to C * value.
214+
that sets the parameter ``C`` of class ``class_label`` to ``C * value``.
216215

217216
.. figure:: ../auto_examples/svm/images/plot_separating_hyperplane_unbalanced_1.png
218217
:target: ../auto_examples/svm/plot_separating_hyperplane_unbalanced.html
@@ -222,7 +221,7 @@ that sets the parameter C of class ``class_label`` to C * value.
222221

223222
:class:`SVC`, :class:`NuSVC`, :class:`SVR`, :class:`NuSVR` and
224223
:class:`OneClassSVM` implement also weights for individual samples in method
225-
``fit`` through keyword sample_weight.
224+
``fit`` through keyword ``sample_weight``.
226225

227226

228227
.. figure:: ../auto_examples/svm/images/plot_weighted_samples_1.png
@@ -331,29 +330,31 @@ Tips on Practical Use
331330
=====================
332331

333332

334-
* **Avoiding data copy**: For SVC, SVR, NuSVC and NuSVR, if the data
335-
passed to certain methods is not C-ordered contiguous, and double
336-
precision, it will be copied before calling the underlying C
337-
implementation. You can check whether a give numpy array is
333+
* **Avoiding data copy**: For :class:`SVC`, :class:`SVR`, :class:`NuSVC` and
334+
:class:`NuSVR`, if the data passed to certain methods is not C-ordered
335+
contiguous, and double precision, it will be copied before calling the
336+
underlying C implementation. You can check whether a give numpy array is
338337
C-contiguous by inspecting its `flags` attribute.
339338

340-
For LinearSVC (and LogisticRegression) any input passed as a
341-
numpy array will be copied and converted to the liblinear
342-
internal sparse data representation (double precision floats
343-
and int32 indices of non-zero components). If you want to fit
344-
a large-scale linear classifier without copying a dense numpy
345-
C-contiguous double precision array as input we suggest to use
346-
the SGDClassifier class instead. The objective function can be
347-
configured to be almost the same as the LinearSVC model.
348-
349-
* **Kernel cache size**: For SVC, SVR, nuSVC and NuSVR, the size of
350-
the kernel cache has a strong impact on run times for larger
351-
problems. If you have enough RAM available, it is recommended to
352-
set `cache_size` to a higher value than the default of 200(MB),
353-
such as 500(MB) or 1000(MB).
354-
355-
* **Setting C**: C is ``1`` by default and it's a reasonable default choice.
356-
If you have a lot of noisy observations you should decrease it.
339+
For :class:`LinearSVC` (and :class:`LogisticRegression
340+
<sklearn.linear_model.LogisticRegression>`) any input passed as a numpy
341+
array will be copied and converted to the liblinear internal sparse data
342+
representation (double precision floats and int32 indices of non-zero
343+
components). If you want to fit a large-scale linear classifier without
344+
copying a dense numpy C-contiguous double precision array as input we
345+
suggest to use the :class:`SGDClassifier
346+
<sklearn.linear_model.SGDClassifier>` class instead. The objective
347+
function can be configured to be almost the same as the :class:`LinearSVC`
348+
model.
349+
350+
* **Kernel cache size**: For :class:`SVC`, :class:`SVR`, :class:`nuSVC` and
351+
:class:`NuSVR`, the size of the kernel cache has a strong impact on run
352+
times for larger problems. If you have enough RAM available, it is
353+
recommended to set ``cache_size`` to a higher value than the default of
354+
200(MB), such as 500(MB) or 1000(MB).
355+
356+
* **Setting C**: ``C`` is ``1`` by default and it's a reasonable default
357+
choice. If you have a lot of noisy observations you should decrease it.
357358
It corresponds to regularize more the estimation.
358359

359360
* Support Vector Machine algorithms are not scale invariant, so **it
@@ -363,24 +364,24 @@ Tips on Practical Use
363364
applied to the test vector to obtain meaningful results. See section
364365
:ref:`preprocessing` for more details on scaling and normalization.
365366

366-
* Parameter nu in NuSVC/OneClassSVM/NuSVR approximates the fraction
367-
of training errors and support vectors.
367+
* Parameter ``nu`` in :class:`NuSVC`/:class:`OneClassSVM`/:class:`NuSVR`
368+
approximates the fraction of training errors and support vectors.
368369

369-
* In SVC, if data for classification are unbalanced (e.g. many
370-
positive and few negative), set class_weight='auto' and/or try
371-
different penalty parameters C.
370+
* In :class:`SVC`, if data for classification are unbalanced (e.g. many
371+
positive and few negative), set ``class_weight='auto'`` and/or try
372+
different penalty parameters ``C``.
372373

373374
* The underlying :class:`LinearSVC` implementation uses a random
374375
number generator to select features when fitting the model. It is
375376
thus not uncommon, to have slightly different results for the same
376377
input data. If that happens, try with a smaller tol parameter.
377378

378-
* Using L1 penalization as provided by LinearSVC(loss='l2',
379-
penalty='l1', dual=False) yields a sparse solution, i.e. only a subset of
380-
feature weights is different from zero and contribute to the decision
381-
function. Increasing C yields a more complex model (more feature are
382-
selected). The C value that yields a "null" model (all weights equal to
383-
zero) can be calculated using :func:`l1_min_c`.
379+
* Using L1 penalization as provided by ``LinearSVC(loss='l2', penalty='l1',
380+
dual=False)`` yields a sparse solution, i.e. only a subset of feature
381+
weights is different from zero and contribute to the decision function.
382+
Increasing ``C`` yields a more complex model (more feature are selected).
383+
The ``C`` value that yields a "null" model (all weights equal to zero) can
384+
be calculated using :func:`l1_min_c`.
384385

385386

386387
.. _svm_kernels:
@@ -420,20 +421,19 @@ python function or by precomputing the Gram matrix.
420421
Classifiers with custom kernels behave the same way as any other
421422
classifiers, except that:
422423

423-
* Field `support_vectors\_` is now empty, only indices of support
424-
vectors are stored in `support_`
424+
* Field ``support_vectors_`` is now empty, only indices of support
425+
vectors are stored in ``support_``
425426

426-
* A reference (and not a copy) of the first argument in the fit()
427-
method is stored for future reference. If that array changes
428-
between the use of fit() and predict() you will have unexpected
429-
results.
427+
* A reference (and not a copy) of the first argument in the ``fit()``
428+
method is stored for future reference. If that array changes between the
429+
use of ``fit()`` and ``predict()`` you will have unexpected results.
430430

431431

432-
Using python functions as kernels
432+
Using Python functions as kernels
433433
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
434434

435435
You can also use your own defined kernels by passing a function to the
436-
keyword `kernel` in the constructor.
436+
keyword ``kernel`` in the constructor.
437437

438438
Your kernel must take as arguments two matrices and return a third matrix.
439439

@@ -454,9 +454,9 @@ instance that will use that kernel::
454454
Using the Gram matrix
455455
~~~~~~~~~~~~~~~~~~~~~
456456

457-
Set kernel='precomputed' and pass the Gram matrix instead of X in the
458-
fit method. At the moment, the kernel values between `all` training
459-
vectors and the test vectors must be provided.
457+
Set ``kernel='precomputed'`` and pass the Gram matrix instead of X in the fit
458+
method. At the moment, the kernel values between `all` training vectors and the
459+
test vectors must be provided.
460460

461461
>>> import numpy as np
462462
>>> from sklearn import svm
@@ -476,17 +476,16 @@ vectors and the test vectors must be provided.
476476
Parameters of the RBF Kernel
477477
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
478478

479-
When training an SVM with the *Radial Basis Function* (RBF) kernel,
480-
two parameters must be considered: `C` and `gamma`. The parameter `C`,
481-
common to all SVM kernels, trades off misclassification of training
482-
examples against simplicity of the decision surface. A low `C` makes
483-
the decision surface smooth, while a high `C` aims at classifying all
484-
training examples correctly. `gamma` defines how much influence a
485-
single training example has. The larger `gamma` is, the closer other
486-
examples must be to be affected.
487-
488-
Proper choice of `C` and `gamma` is critical to the SVM's performance.
489-
One is advised to use :class:`GridSearchCV` with `C` and `gamma` spaced
479+
When training an SVM with the *Radial Basis Function* (RBF) kernel, two
480+
parameters must be considered: ``C`` and ``gamma``. The parameter ``C``,
481+
common to all SVM kernels, trades off misclassification of training examples
482+
against simplicity of the decision surface. A low ``C`` makes the decision
483+
surface smooth, while a high ``C`` aims at classifying all training examples
484+
correctly. ``gamma`` defines how much influence a single training example has.
485+
The larger ``gamma`` is, the closer other examples must be to be affected.
486+
487+
Proper choice of ``C`` and ``gamma`` is critical to the SVM's performance. One
488+
is advised to use :class:`GridSearchCV` with ``C`` and ``gamma`` spaced
490489
exponentially far apart to choose good values.
491490

492491
.. topic:: Examples:
@@ -514,9 +513,9 @@ generalization error of the classifier.
514513
SVC
515514
---
516515

517-
Given training vectors :math:`x_i \in R^p`, i=1,..., n, in two
518-
classes, and a vector :math:`y \in R^n` such that :math:`y_i \in {1,
519-
-1}`, SVC solves the following primal problem:
516+
Given training vectors :math:`x_i \in R^p`, i=1,..., n, in two classes, and a
517+
vector :math:`y \in R^n` such that :math:`y_i \in \{1, -1\}`, SVC solves the
518+
following primal problem:
520519

521520

522521
.. math::
@@ -538,22 +537,22 @@ Its dual is
538537
\textrm {subject to } & y^T \alpha = 0\\
539538
& 0 \leq \alpha_i \leq C, i=1, ..., l
540539
541-
where :math:`e` is the vector of all ones, C > 0 is the upper bound, Q
542-
is an n by n positive semidefinite matrix, :math:`Q_ij \equiv K(x_i,
543-
x_j)` and :math:`\phi (x_i)^T \phi (x)` is the kernel. Here training
544-
vectors are mapped into a higher (maybe infinite) dimensional space by
545-
the function :math:`\phi`.
540+
where :math:`e` is the vector of all ones, :math:`C > 0` is the upper bound,
541+
:math:`Q` is an `n` by `n` positive semidefinite matrix, :math:`Q_{ij} \equiv
542+
K(x_i, x_j)` and :math:`\phi (x_i)^T \phi (x)` is the kernel. Here training
543+
vectors are mapped into a higher (maybe infinite) dimensional space by the
544+
function :math:`\phi`.
546545

547546

548547
The decision function is:
549548

550-
.. math:: sgn(\sum_{i=1}^n y_i \alpha_i K(x_i, x) + \rho)
549+
.. math:: \operatorname{sgn}(\sum_{i=1}^n y_i \alpha_i K(x_i, x) + \rho)
551550

552551
.. note::
553552

554-
While SVM models derived from libsvm and liblinear use *C* as regularization
555-
parameter, most other estimators use *alpha*. The relation between both is
556-
:math:`C = \frac{n\_samples}{alpha}`.
553+
While SVM models derived from `libsvm`_ and `liblinear`_ use ``C`` as
554+
regularization parameter, most other estimators use ``alpha``. The relation
555+
between both is :math:`C = \frac{n\_samples}{alpha}`.
557556

558557
.. TODO multiclass case ?/
559558

0 commit comments

Comments
 (0)