@@ -67,15 +67,14 @@ slightly different sets of parameters and have different mathematical
6767formulations (see section :ref: `svm_mathematical_formulation `). On the
6868other hand, :class: `LinearSVC ` is another implementation of Support
6969Vector Classification for the case of a linear kernel. Note that
70- :class: `LinearSVC ` does not accept keyword ' kernel' , as this is
70+ :class: `LinearSVC ` does not accept keyword `` kernel `` , as this is
7171assumed to be linear. It also lacks some of the members of
72- :class: `SVC ` and :class: `NuSVC `, like support \_ .
72+ :class: `SVC ` and :class: `NuSVC `, like `` support_ `` .
7373
7474As other classifiers, :class: `SVC `, :class: `NuSVC ` and
75- :class: `LinearSVC ` take as input two arrays: an array X of size
76- [n_samples, n_features] holding the training samples, and an array Y
77- of integer values, size [n_samples], holding the class labels for the
78- training samples::
75+ :class: `LinearSVC ` take as input two arrays: an array X of size ``[n_samples,
76+ n_features] `` holding the training samples, and an array Y of integer values,
77+ size ``[n_samples] ``, holding the class labels for the training samples::
7978
8079
8180 >>> from sklearn import svm
@@ -94,8 +93,8 @@ After being fitted, the model can then be used to predict new values::
9493
9594SVMs decision function depends on some subset of the training data,
9695called the support vectors. Some properties of these support vectors
97- can be found in members `support_vectors_ `, `support_ ` and
98- `n_support `::
96+ can be found in members `` support_vectors_ `` , `` support_ ` ` and
97+ `` n_support ` `::
9998
10099 >>> # get support vectors
101100 >>> clf.support_vectors_
@@ -115,7 +114,7 @@ Multi-class classification
115114
116115:class: `SVC ` and :class: `NuSVC ` implement the "one-against-one"
117116approach (Knerr et al., 1990) for multi- class classification. If
118- n_class is the number of classes, then n_class * (n_class - 1)/2
117+ `` n_class `` is the number of classes, then `` n_class * (n_class - 1) / 2 ``
119118classifiers are constructed and each one trains data from two classes::
120119
121120 >>> X = [[0], [1], [2], [3]]
@@ -147,7 +146,7 @@ the decision function.
147146
148147Note that the :class: `LinearSVC ` also implements an alternative multi-class
149148strategy, the so-called multi-class SVM formulated by Crammer and Singer, by
150- using the option " multi_class='crammer_singer'" . This method is consistent,
149+ using the option `` multi_class='crammer_singer' `` . This method is consistent,
151150which is not true for one-vs-rest classification.
152151In practice, on-vs-rest classification is usually preferred, since the results
153152are mostly similar, but the runtime is significantly less.
@@ -161,9 +160,9 @@ order of the "one" class.
161160In the case of "one-vs-one" :class: `SVC `, the layout of the attributes
162161is a little more involved. In the case of having a linear kernel,
163162The layout of ``coef_ `` and ``intercept_ `` is similar to the one
164- described for :class: `LinearSVC ` described above, except that
165- the shape of ``coef_ `` is ``[n_class * (n_class - 1) / 2 ``,
166- corresponding to as many binary classifiers. The order for classes
163+ described for :class: `LinearSVC ` described above, except that the shape of
164+ ``coef_ `` is ``[n_class * (n_class - 1) / 2, n_features] ``, corresponding to as
165+ many binary classifiers. The order for classes
1671660 to n is "0 vs 1", "0 vs 2" , ... "0 vs n", "1 vs 2", "1 vs 3", "1 vs n", . .
168167. "n-1 vs n".
169168
@@ -177,13 +176,13 @@ for these classifiers.
177176
178177This might be made more clear by an example:
179178
180- Consider a three class problem with with class 0 having 3 support vectors
181- :math: `v^{0 }_0 , v^{1 }_0 , v^{2 }_0 ` and class 1 and 2 having two support
182- vectors :math: `v^{0 }_1 , v^{1 }_1 ` and :math: `v^{0 }_1 , v^{1 }_1 ` respectively.
183- For each support vector :math: `v^{j}_i`, there are 2 dual coefficients.
184- Let's call the coefficient of support vector :math: `v^{j}_i` in the
185- classifier between classes `i ` and `k ` :math: `\alpha ^{j}_{i,k}`.
186- Then `` dual_coef_ `` looks like this:
179+ Consider a three class problem with with class 0 having three support vectors
180+ :math: `v^{0 }_0 , v^{1 }_0 , v^{2 }_0 ` and class 1 and 2 having two support vectors
181+ :math: `v^{0 }_1 , v^{1 }_1 ` and :math: `v^{0 }_1 , v^{1 }_1 ` respectively. For each
182+ support vector :math: `v^{j}_i`, there are two dual coefficients. Let's call
183+ the coefficient of support vector :math: `v^{j}_i` in the classifier between
184+ classes `i ` and `k ` :math: `\alpha ^{j}_{i,k}`. Then `` dual_coef_ `` looks like
185+ this:
187186
188187+------------------------+------------------------+------------------+
189188| :math:`\a lpha^{0}_{0,1}`|:math:`\a lpha^{0}_{0,2}`|Coefficients |
@@ -210,9 +209,9 @@ classes or certain individual samples keywords ``class_weight`` and
210209``sample_weight `` can be used.
211210
212211:class: `SVC ` (but not :class: `NuSVC `) implement a keyword
213- ``class_weight `` in the fit method. It's a dictionary of the form
212+ ``class_weight `` in the `` fit `` method. It's a dictionary of the form
214213``{class_label : value} ``, where value is a floating point number > 0
215- that sets the parameter C of class ``class_label `` to C * value.
214+ that sets the parameter `` C `` of class ``class_label `` to `` C * value `` .
216215
217216.. figure :: ../auto_examples/svm/images/plot_separating_hyperplane_unbalanced_1.png
218217 :target: ../auto_examples/svm/plot_separating_hyperplane_unbalanced.html
@@ -222,7 +221,7 @@ that sets the parameter C of class ``class_label`` to C * value.
222221
223222:class: `SVC `, :class: `NuSVC `, :class: `SVR `, :class: `NuSVR ` and
224223:class: `OneClassSVM ` implement also weights for individual samples in method
225- ``fit `` through keyword sample_weight.
224+ ``fit `` through keyword `` sample_weight `` .
226225
227226
228227.. figure :: ../auto_examples/svm/images/plot_weighted_samples_1.png
@@ -331,29 +330,31 @@ Tips on Practical Use
331330=====================
332331
333332
334- * **Avoiding data copy **: For SVC, SVR, NuSVC and NuSVR, if the data
335- passed to certain methods is not C-ordered contiguous, and double
336- precision, it will be copied before calling the underlying C
337- implementation. You can check whether a give numpy array is
333+ * **Avoiding data copy **: For :class: ` SVC `, :class: ` SVR `, :class: ` NuSVC ` and
334+ :class: ` NuSVR `, if the data passed to certain methods is not C-ordered
335+ contiguous, and double precision, it will be copied before calling the
336+ underlying C implementation. You can check whether a give numpy array is
338337 C-contiguous by inspecting its `flags ` attribute.
339338
340- For LinearSVC (and LogisticRegression) any input passed as a
341- numpy array will be copied and converted to the liblinear
342- internal sparse data representation (double precision floats
343- and int32 indices of non-zero components). If you want to fit
344- a large-scale linear classifier without copying a dense numpy
345- C-contiguous double precision array as input we suggest to use
346- the SGDClassifier class instead. The objective function can be
347- configured to be almost the same as the LinearSVC model.
348-
349- * **Kernel cache size **: For SVC, SVR, nuSVC and NuSVR, the size of
350- the kernel cache has a strong impact on run times for larger
351- problems. If you have enough RAM available, it is recommended to
352- set `cache_size ` to a higher value than the default of 200(MB),
353- such as 500(MB) or 1000(MB).
354-
355- * **Setting C **: C is ``1 `` by default and it's a reasonable default choice.
356- If you have a lot of noisy observations you should decrease it.
339+ For :class: `LinearSVC ` (and :class: `LogisticRegression
340+ <sklearn.linear_model.LogisticRegression> `) any input passed as a numpy
341+ array will be copied and converted to the liblinear internal sparse data
342+ representation (double precision floats and int32 indices of non-zero
343+ components). If you want to fit a large-scale linear classifier without
344+ copying a dense numpy C-contiguous double precision array as input we
345+ suggest to use the :class: `SGDClassifier
346+ <sklearn.linear_model.SGDClassifier> ` class instead. The objective
347+ function can be configured to be almost the same as the :class: `LinearSVC `
348+ model.
349+
350+ * **Kernel cache size **: For :class: `SVC `, :class: `SVR `, :class: `nuSVC ` and
351+ :class: `NuSVR `, the size of the kernel cache has a strong impact on run
352+ times for larger problems. If you have enough RAM available, it is
353+ recommended to set ``cache_size `` to a higher value than the default of
354+ 200(MB), such as 500(MB) or 1000(MB).
355+
356+ * **Setting C **: ``C `` is ``1 `` by default and it's a reasonable default
357+ choice. If you have a lot of noisy observations you should decrease it.
357358 It corresponds to regularize more the estimation.
358359
359360 * Support Vector Machine algorithms are not scale invariant, so **it
@@ -363,24 +364,24 @@ Tips on Practical Use
363364 applied to the test vector to obtain meaningful results. See section
364365 :ref: `preprocessing ` for more details on scaling and normalization.
365366
366- * Parameter nu in NuSVC/ OneClassSVM/ NuSVR approximates the fraction
367- of training errors and support vectors.
367+ * Parameter `` nu `` in :class: ` NuSVC `/ :class: ` OneClassSVM `/ :class: ` NuSVR `
368+ approximates the fraction of training errors and support vectors.
368369
369- * In SVC, if data for classification are unbalanced (e.g. many
370- positive and few negative), set class_weight='auto' and/or try
371- different penalty parameters C .
370+ * In :class: ` SVC ` , if data for classification are unbalanced (e.g. many
371+ positive and few negative), set `` class_weight='auto' `` and/or try
372+ different penalty parameters `` C `` .
372373
373374 * The underlying :class: `LinearSVC ` implementation uses a random
374375 number generator to select features when fitting the model. It is
375376 thus not uncommon, to have slightly different results for the same
376377 input data. If that happens, try with a smaller tol parameter.
377378
378- * Using L1 penalization as provided by LinearSVC(loss='l2',
379- penalty='l1', dual=False) yields a sparse solution, i.e. only a subset of
380- feature weights is different from zero and contribute to the decision
381- function. Increasing C yields a more complex model (more feature are
382- selected). The C value that yields a "null" model (all weights equal to
383- zero) can be calculated using :func: `l1_min_c `.
379+ * Using L1 penalization as provided by `` LinearSVC(loss='l2', penalty='l1 ',
380+ dual=False) `` yields a sparse solution, i.e. only a subset of feature
381+ weights is different from zero and contribute to the decision function.
382+ Increasing `` C `` yields a more complex model (more feature are selected).
383+ The `` C `` value that yields a "null" model (all weights equal to zero) can
384+ be calculated using :func: `l1_min_c `.
384385
385386
386387.. _svm_kernels :
@@ -420,20 +421,19 @@ python function or by precomputing the Gram matrix.
420421Classifiers with custom kernels behave the same way as any other
421422classifiers, except that:
422423
423- * Field `support_vectors\_ ` is now empty, only indices of support
424- vectors are stored in `support_ `
424+ * Field `` support_vectors_ ` ` is now empty, only indices of support
425+ vectors are stored in `` support_ ` `
425426
426- * A reference (and not a copy) of the first argument in the fit()
427- method is stored for future reference. If that array changes
428- between the use of fit() and predict() you will have unexpected
429- results.
427+ * A reference (and not a copy) of the first argument in the ``fit() ``
428+ method is stored for future reference. If that array changes between the
429+ use of ``fit() `` and ``predict() `` you will have unexpected results.
430430
431431
432- Using python functions as kernels
432+ Using Python functions as kernels
433433~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
434434
435435You can also use your own defined kernels by passing a function to the
436- keyword `kernel ` in the constructor.
436+ keyword `` kernel ` ` in the constructor.
437437
438438Your kernel must take as arguments two matrices and return a third matrix.
439439
@@ -454,9 +454,9 @@ instance that will use that kernel::
454454Using the Gram matrix
455455~~~~~~~~~~~~~~~~~~~~~
456456
457- Set kernel='precomputed' and pass the Gram matrix instead of X in the
458- fit method. At the moment, the kernel values between `all ` training
459- vectors and the test vectors must be provided.
457+ Set `` kernel='precomputed' `` and pass the Gram matrix instead of X in the fit
458+ method. At the moment, the kernel values between `all ` training vectors and the
459+ test vectors must be provided.
460460
461461 >>> import numpy as np
462462 >>> from sklearn import svm
@@ -476,17 +476,16 @@ vectors and the test vectors must be provided.
476476Parameters of the RBF Kernel
477477~~~~~~~~~~~~~~~~~~~~~~~~~~~~
478478
479- When training an SVM with the *Radial Basis Function * (RBF) kernel,
480- two parameters must be considered: `C ` and `gamma `. The parameter `C `,
481- common to all SVM kernels, trades off misclassification of training
482- examples against simplicity of the decision surface. A low `C ` makes
483- the decision surface smooth, while a high `C ` aims at classifying all
484- training examples correctly. `gamma ` defines how much influence a
485- single training example has. The larger `gamma ` is, the closer other
486- examples must be to be affected.
487-
488- Proper choice of `C ` and `gamma ` is critical to the SVM's performance.
489- One is advised to use :class: `GridSearchCV ` with `C ` and `gamma ` spaced
479+ When training an SVM with the *Radial Basis Function * (RBF) kernel, two
480+ parameters must be considered: ``C `` and ``gamma ``. The parameter ``C ``,
481+ common to all SVM kernels, trades off misclassification of training examples
482+ against simplicity of the decision surface. A low ``C `` makes the decision
483+ surface smooth, while a high ``C `` aims at classifying all training examples
484+ correctly. ``gamma `` defines how much influence a single training example has.
485+ The larger ``gamma `` is, the closer other examples must be to be affected.
486+
487+ Proper choice of ``C `` and ``gamma `` is critical to the SVM's performance. One
488+ is advised to use :class: `GridSearchCV ` with ``C `` and ``gamma `` spaced
490489exponentially far apart to choose good values.
491490
492491.. topic :: Examples:
@@ -514,9 +513,9 @@ generalization error of the classifier.
514513SVC
515514---
516515
517- Given training vectors :math: `x_i \in R^p`, i=1,..., n, in two
518- classes, and a vector :math: `y \in R^n` such that :math: `y_i \in {1 ,
519- - 1 }`, SVC solves the following primal problem:
516+ Given training vectors :math: `x_i \in R^p`, i=1,..., n, in two classes, and a
517+ vector :math: `y \in R^n` such that :math: `y_i \in \ {1 , - 1 \}`, SVC solves the
518+ following primal problem:
520519
521520
522521.. math ::
@@ -538,22 +537,22 @@ Its dual is
538537 \textrm {subject to } & y^T \alpha = 0 \\
539538 & 0 \leq \alpha _i \leq C, i=1 , ..., l
540539
541- where :math: `e` is the vector of all ones, C > 0 is the upper bound, Q
542- is an n by n positive semidefinite matrix, :math: `Q_ij \equiv K(x_i,
543- x_j)` and :math: `\phi (x_i)^T \phi (x)` is the kernel. Here training
544- vectors are mapped into a higher (maybe infinite) dimensional space by
545- the function :math: `\phi `.
540+ where :math: `e` is the vector of all ones, :math: ` C > 0 ` is the upper bound,
541+ :math: `Q` is an ` n ` by ` n ` positive semidefinite matrix, :math: `Q_{ij} \equiv
542+ K(x_i, x_j)` and :math: `\phi (x_i)^T \phi (x)` is the kernel. Here training
543+ vectors are mapped into a higher (maybe infinite) dimensional space by the
544+ function :math: `\phi `.
546545
547546
548547The decision function is:
549548
550- .. math :: sgn(\sum_{i=1}^n y_i \alpha_i K(x_i, x) + \rho)
549+ .. math :: \operatorname{ sgn} (\sum_{i=1}^n y_i \alpha_i K(x_i, x) + \rho)
551550
552551.. note ::
553552
554- While SVM models derived from libsvm and liblinear use * C * as regularization
555- parameter, most other estimators use * alpha * . The relation between both is
556- :math: `C = \frac {n\_samples}{alpha}`.
553+ While SVM models derived from ` libsvm `_ and ` liblinear `_ use `` C `` as
554+ regularization parameter, most other estimators use `` alpha `` . The relation
555+ between both is :math: `C = \frac {n\_samples}{alpha}`.
557556
558557.. TODO multiclass case ?/
559558
0 commit comments