2

Just curious about two options in scikits learn SVM class. What does Scale_C and shrinking do? There wasn't much in the documentation. Scale C seems to be able to scale the C paramter appropriately for the training data.

Thanks

ogrisel
  • 39,309
  • 12
  • 116
  • 125
tomas
  • 665
  • 1
  • 10
  • 14

1 Answers1

2

scale_C=True (deprecated in the dev version and scheduled for removal in 0.12) causes the regularization parameter C to be divided by the number of samples before it is handed to the underlying LibSVM implementation.

shrinking enables or disables the "shrinking heuristic", described by Joachims 1999, that should speed up SVM training.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • Ah okay thanks. How does dividing C by the number of samples help with the SVM training? – tomas Mar 09 '12 at 23:04
  • @tomas: setting it to `True` makes the regularization independent of the number of samples. When set to `False`, you have to double it when the number of samples doubles, etc. I suggest you always set it to `True`; I believe that's going to be the future behavior. – Fred Foo Mar 09 '12 at 23:13
  • 2
    In any case the value of C should be selected by cross-validate grid search on the development set. Neither is intrinsically better that the other. You just need to know that C is no longer scaled by the number of sample in the dev version of scikit-learn in case you are interested its absolute value and how it is used in the objective function (e.g. for publishing a scientific journal). – ogrisel Mar 10 '12 at 06:30