11

For improving Support Vector Machine outcomes i have to use grid search for searching better parameters and cross validation. I'm not sure how combining them in scikit-learn. Grid search search best parameters (http://scikit-learn.org/stable/modules/grid_search.html) and cross validation avoid overfitting (http://scikit-learn.org/dev/modules/cross_validation.html)

#GRID SEARCH
from sklearn import grid_search
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
svr = svm.SVC()
clf = grid_search.GridSearchCV(svr, parameters)
#print(clf.fit(X, Y))

#CROSS VALIDATION
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, Y, test_size=0.4, random_state=0)
clf = svm.SVC(kernel='linear', C=1).fit(X_train, y_train)

print("crossvalidation")
print(clf.score(X_test, y_test))
clf = svm.SVC(kernel='linear', C=1)
scores = cross_validation.cross_val_score(clf, X, Y, cv=3)
print(scores )

results:

GridSearchCV(cv=None,
   estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel=rbf, probability=False, shrinking=True, tol=0.001, verbose=False),
   estimator__C=1.0, estimator__cache_size=200,
   estimator__class_weight=None, estimator__coef0=0.0,
   estimator__degree=3, estimator__gamma=0.0, estimator__kernel=rbf,
   estimator__probability=False, estimator__shrinking=True,
   estimator__tol=0.001, estimator__verbose=False, fit_params={},
   iid=True, loss_func=None, n_jobs=1,
   param_grid={'kernel': ('linear', 'rbf'), 'C': [1, 10]},
   pre_dispatch=2*n_jobs, refit=True, score_func=None, verbose=0)

crossvalidation
0.0
[ 0.11111111  0.11111111  0.        ]
herrfz
  • 4,814
  • 4
  • 26
  • 37
postgres
  • 2,242
  • 5
  • 34
  • 50

1 Answers1

13

You should do a development / evaluation split first, run the grid search on the development part and measure a unique final score on the evaluation part at the end:

There is an example in the documentation.

ogrisel
  • 39,309
  • 12
  • 116
  • 125
  • I tried to run with my data and i got this error: clf = GridSearchCV(SVC(C=1), tuned_parameters, scoring=score) TypeError: __init__() got an unexpected keyword argument 'scoring', i tried also to run the original example and there is the same error, but how it is possible? scoring it is a function parameter! – postgres Feb 17 '13 at 15:50
  • 2
    Check the version number of the doc and select the one that matches what you installed. The URLs are different for each version: http://scikit-learn.org/dev/modules/grid_search.html is the development branch. http://scikit-learn.org/stable/modules/grid_search.html is the last released version (0.13 at the time of writing) and http://scikit-learn.org/0.13/modules/grid_search.html is a fixed URL for the 0.13 release. – ogrisel Feb 18 '13 at 10:58
  • I fixed the answer to point to the stable version of the doc. – ogrisel Jun 09 '15 at 08:00
  • @ogrisel the example link is broken now. – Rafs Jun 01 '22 at 16:10