4

I applied SVM (scikit-learn) in some dataset and wanted to find the values of C and gamma that can give the best accuracy for the test set.

I first fixed C to a some integer and then iterate over many values of gamma until I got the gamma which gave me the best test set accuracy for that C. And then I fixed this gamma which i got in the above step and iterate over values of C and find a C which can give me best accuracy and so on ...

But the above steps can never give the best combination of gamma and C that produce best test set accuracy.

Can anyone help me in finding a way out to get this combo (gamma,C) in sckit-learn ?

Gambit1614
  • 8,547
  • 1
  • 25
  • 51
asn
  • 2,408
  • 5
  • 23
  • 37
  • 1
    Surely not ! bcz there will be a high chance that i will be stuck in local maximum and the combination of C and gamma will not give me the best accuracy. – asn Sep 30 '17 at 14:56
  • Did you try implementing it or are you just guessing it ? The grid search will try all possible combinations, hence it won't get stuck in local Maxima – Gambit1614 Sep 30 '17 at 15:04
  • 1
    @MohammedKashif I tried it but the process seams to be unending by fixing one and iterating over the other and doing this for the other – asn Mar 16 '18 at 19:15

1 Answers1

6

You are looking for Hyper-Parameter tuning. In parameter tuning we pass a dictionary containing a list of possible values for you classifier, then depending on the method that you choose (i.e. GridSearchCV, RandomSearch, etc.) the best possible parameters are returned. You can read more about it here.

As example :

#Create a dictionary of possible parameters
params_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100],
          'gamma': [0.0001, 0.001, 0.01, 0.1],
          'kernel':['linear','rbf'] }

#Create the GridSearchCV object
grid_clf = GridSearchCV(SVC(class_weight='balanced'), params_grid)

#Fit the data with the best possible parameters
grid_clf = clf.fit(X_train, y_train)

#Print the best estimator with it's parameters
print grid_clf.best_estimators

You can read more about GridSearchCV here and RandomizedSearchCV here. A word of caution though, SVM takes a lot of CPU power so be careful with the number of parameters you pass. It might take some time to process depending upon your data and the number of parameters you pass.

This link also contains an example as well

Gambit1614
  • 8,547
  • 1
  • 25
  • 51
  • 2
    you are creating a variable "params_grid" and using "params_grids". Please correct that. Also, this gives an error saying "'SVC' object has no attribute 'best_estimators'". Can you please provide complete code? – Vipul Sharma Oct 28 '18 at 11:32
  • @VipulSharma use `clf.best_params_` (on the `clf` object) – Arthur Attout Dec 09 '18 at 21:09
  • Thanks for tha answer. After getting optimal parameters how can we verify that they are good? is it using `X_test`? Can we use cross validation instead? :) – EmJ Apr 10 '19 at 04:34
  • 1
    @Emi you need to use `X_test` to test your classifier. If you want to use cross-validation, just specify the `cv` attribute in `GridSearchCV`. – Gambit1614 Apr 10 '19 at 07:27
  • @Gambit thanks a lot :) btw please let me know if you know an answer for this. https://stackoverflow.com/questions/55609339/how-to-perform-feature-selection-with-gridsearchcv thank you :) – EmJ Apr 10 '19 at 09:52
  • 1
    @Gambit thanks a lot for the great answer. yes it is very helpful. Just a quick question. Is there a way to get the selected features from rfecv? Moreover, how can we validate X_test using the selected features? Looking forward to hearing from you. Thank you very much once again :) – EmJ Apr 10 '19 at 10:25