|
11 | 11 | 'close'. The ``gamma`` parameters can be seen as the inverse of the radius of
|
12 | 12 | influence of samples selected by the model as support vectors.
|
13 | 13 |
|
14 |
| -The ``C`` parameter trades off correct classification of training examples |
15 |
| -against maximization of the decision function's margin. For larger values of |
16 |
| -``C``, a smaller margin will be accepted if the decision function is better at |
17 |
| -classifying all training points correctly. A lower ``C`` will encourage a larger |
18 |
| -margin, therefore a simpler decision function, at the cost of training accuracy. |
19 |
| -In other words``C`` behaves as a regularization parameter in the SVM. |
| 14 | +The ``C`` parameter trades off correct classification of training examples |
| 15 | +against maximization of the decision function's margin. For larger values of |
| 16 | +``C``, a smaller margin will be accepted if the decision function is better at |
| 17 | +classifying all training points correctly. A lower ``C`` will encourage a |
| 18 | +larger margin, therefore a simpler decision function, at the cost of training |
| 19 | +accuracy. In other words``C`` behaves as a regularization parameter in the |
| 20 | +SVM. |
20 | 21 |
|
21 | 22 | The first plot is a visualization of the decision function for a variety of
|
22 | 23 | parameter values on a simplified classification problem involving only 2 input
|
|
49 | 50 | For intermediate values, we can see on the second plot that good models can
|
50 | 51 | be found on a diagonal of ``C`` and ``gamma``. Smooth models (lower ``gamma``
|
51 | 52 | values) can be made more complex by increasing the importance of classifying
|
52 |
| -each point correctly (larger ``C`` values) hence the diagonal of good performing |
53 |
| -models. |
| 53 | +each point correctly (larger ``C`` values) hence the diagonal of good |
| 54 | +performing models. |
54 | 55 |
|
55 | 56 | Finally one can also observe that for some intermediate values of ``gamma`` we
|
56 |
| -get equally performing models when ``C`` becomes very large: it is not necessary |
57 |
| -to regularize by enforcing a larger margin. The radius of the RBF kernel alone |
58 |
| -acts as a good structural regularizer. In practice though it might still be |
59 |
| -interesting to simplify the decision function with a lower value of ``C`` so as |
60 |
| -to favor models that use less memory and that are faster to predict. |
| 57 | +get equally performing models when ``C`` becomes very large: it is not |
| 58 | +necessary to regularize by enforcing a larger margin. The radius of the RBF |
| 59 | +kernel alone acts as a good structural regularizer. In practice though it |
| 60 | +might still be interesting to simplify the decision function with a lower |
| 61 | +value of ``C`` so as to favor models that use less memory and that are faster |
| 62 | +to predict. |
61 | 63 |
|
62 | 64 | We should also note that small differences in scores results from the random
|
63 | 65 | splits of the cross-validation procedure. Those spurious variations can be
|
|
0 commit comments