9

Hi I am performing SVM classification using SMO, in which my kernel is RBF, now I want to select c and sigma values, using grid search and cross validation, I am new to kernel functions, please help, in step by step process

om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
pradeep deep
  • 99
  • 1
  • 1
  • 2

4 Answers4

18
  1. Pick some values for C and sigma that you think are interesting. E.g., C = {1, 10, 100, 1000} and sigma = {.01, .1, 1} (I'm just making these up).
  2. Divide the training set into k (e.g. 10) parts, preferably in a stratified way.
  3. Loop over all pairs of C and sigma values.
    1. Loop over all k parts of your training set. Hold the k'th part out. Train a classifier on all of the other parts combined, then test on the held out part.
    2. Keep track of some score (accuracy, F1, or whatever you want to optimize).
  4. Return the best performing value pair for C, sigma by the scores you just computed.
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • To clarify: "Train a classifier on each of the other parts" generally means train it on the other parts combined, not each one individually. – Danica Mar 18 '12 at 18:32
  • @Dougal: yes, that's what I meant. Thanks. – Fred Foo Mar 18 '12 at 20:03
  • I think its worth noting that what you are referring to (dividing into k parts, etc) is called Cross-Validation, specifically 10-fold cross-validation. The OP may not know that, sometimes the hardest part of trying to find more information is knowing what keywords to search for. – karenu Mar 19 '12 at 14:48
  • hi thanks for your answer but please could u say how to code it , – pradeep deep Mar 29 '12 at 10:21
  • 5
    @pradeepdeep: I just gave you the algorithm, you'll have to code it yourself. – Fred Foo Mar 29 '12 at 10:27
  • This will ease things : Use GridSearch Cross Validation IF using scikit-learn in python. Where you specify different parameters and values along with cross validation value ( eg: 5). Then you can get best score and best params for the estimator. It's just same as what is answered above. But in python – MANU Jul 23 '17 at 10:31
4

Read A Practical Guide to Support Vector Classication by Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen. They address this exact issue and explain methods for performing a grid-search for parameter selection. http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

karenu
  • 3,016
  • 1
  • 15
  • 11
1

I will just add a little bit of explanation to larsmans' answer.

The C parameter is a regularization/slack parameter. Its smaller values force the weights to be small. The larger it gets, the allowed range of weights gets wider. Resultantly, larger C values increase the penalty for misclassification and thus reduce the classification error rate on the training data (which may lead to over-fitting). Your training time and number of support vectors will increase as you increase the value of C.

You may also find it useful to read Extending SVM to a Soft Margin Classifier by K.K. Chin.

Community
  • 1
  • 1
Neeraj Bhatnagar
  • 341
  • 1
  • 2
  • 6
0

You can also use Uniform Design model selection which reduces the number of tuples you need to check. The paper which explains it is "Model selection for support vector machines via uniform design" by Chien-Ming Huang Some implementation in python are exist in ssvm 0.2

eSadr
  • 395
  • 5
  • 21