|
11 | 11 | 'close'. The ``gamma`` parameters can be seen as the inverse of the radius of
|
12 | 12 | influence of samples selected by the model as support vectors.
|
13 | 13 |
|
14 |
| -The ``C`` parameter trades off misclassification of training examples against |
15 |
| -simplicity of the decision surface. A low ``C`` makes the decision surface |
16 |
| -smooth, while a high ``C`` aims at classifying all training examples correctly |
17 |
| -by giving the model freedom to select more samples as support vectors. |
| 14 | +The ``C`` parameter trades off correct classification of training examples |
| 15 | +against maximization of the decision function's margin. For larger values of |
| 16 | +``C``, a smaller margin will be accepted if the decision function is better at |
| 17 | +classifying all training points correctly. A lower ``C`` will encourage a larger |
| 18 | +margin, therefore a simpler decision function, at the cost of training accuracy. |
| 19 | +In other words``C`` behaves as a regularization parameter in the SVM. |
18 | 20 |
|
19 | 21 | The first plot is a visualization of the decision function for a variety of
|
20 | 22 | parameter values on a simplified classification problem involving only 2 input
|
|
46 | 48 |
|
47 | 49 | For intermediate values, we can see on the second plot that good models can
|
48 | 50 | be found on a diagonal of ``C`` and ``gamma``. Smooth models (lower ``gamma``
|
49 |
| -values) can be made more complex by selecting a larger number of support |
50 |
| -vectors (larger ``C`` values) hence the diagonal of good performing models. |
| 51 | +values) can be made more complex by increasing the importance of classifying |
| 52 | +each point correctly (larger ``C`` values) hence the diagonal of good performing |
| 53 | +models. |
51 | 54 |
|
52 | 55 | Finally one can also observe that for some intermediate values of ``gamma`` we
|
53 |
| -get equally performing models when ``C`` becomes very large: it is not |
54 |
| -necessary to regularize by limiting the number of support vectors. The radius of |
55 |
| -the RBF kernel alone acts as a good structural regularizer. In practice though |
56 |
| -it might still be interesting to limit the number of support vectors with a |
57 |
| -lower value of ``C`` so as to favor models that use less memory and that are |
58 |
| -faster to predict. |
| 56 | +get equally performing models when ``C`` becomes very large: it is not necessary |
| 57 | +to regularize by enforcing a larger margin. The radius of the RBF kernel alone |
| 58 | +acts as a good structural regularizer. In practice though it might still be |
| 59 | +interesting to simplify the decision function with a lower value of ``C`` so as |
| 60 | +to favor models that use less memory and that are faster to predict. |
59 | 61 |
|
60 | 62 | We should also note that small differences in scores results from the random
|
61 | 63 | splits of the cross-validation procedure. Those spurious variations can be
|
|
0 commit comments