18

using python scikit svm, after running clf.fit(X, Y), you get your support vectors. could I load these support vectors directly (passing them as paramter) when instantiate a svm.SVC object? which means I do not need to running fit() method each time to do predication

ymliu
  • 345
  • 1
  • 3
  • 8
  • Possible duplicate http://stackoverflow.com/questions/11440970/how-can-i-save-a-libsvm-python-object-instance – Pedrom Mar 22 '13 at 11:46

2 Answers2

23

From the scikit manual: http://scikit-learn.org/stable/modules/model_persistence.html

1.2.4 Model persistence It is possible to save a model in the scikit by using Python’s built-in persistence model, namely pickle.

>>> from sklearn import svm
>>> from sklearn import datasets
>>> clf = svm.SVC()
>>> iris = datasets.load_iris()
>>> X, y = iris.data, iris.target
>>> clf.fit(X, y)
SVC(kernel=’rbf’, C=1.0, probability=False, degree=3, coef0=0.0, eps=0.001,
cache_size=100.0, shrinking=True, gamma=0.00666666666667)
>>> import pickle
>>> s = pickle.dumps(clf)
>>> clf2 = pickle.loads(s)
>>> clf2.predict(X[0])
array([ 0.])
>>> y[0]
0

In the specific case of the scikit, it may be more interesting to use joblib’s replacement of pickle, which is more efficient on big data, but can only pickle to the disk and not to a string:

>>> from sklearn.externals import joblib
>>> joblib.dump(clf, ’filename.pkl’)
DanielBarbarian
  • 5,093
  • 12
  • 35
  • 44
Robert
  • 246
  • 2
  • 4
  • 2
    The link is broken. Use this instead: http://scikit-learn.org/stable/modules/model_persistence.html – Tommz May 22 '15 at 18:08
  • 1
    Note that with pickle you tie yourself to a specific scikit version, it is not a good solution for long-term storage of models. – Adversus Apr 14 '16 at 08:08
3

You can save the model in order to use it later. I wrote the code below to use the model when there exists one that I fitted and saved before.

from sklearn.externals import joblib
svm_linear_estimator = svm.SVC(kernel='linear', probability=False, C=1)
try:
    estimator = joblib.load("/my_models/%s.pkl"%dataset_name)
    print "using trained model"
except:
    print "building new model"
    estimator.fit(data_train, class_train)
    joblib.dump(estimator,"/my_models/%s.pkl"%dataset_name)
Bilal Dadanlar
  • 820
  • 7
  • 14
  • when you save the trained model, it can create more than one file. but you still call it with "dataset_name.pkl" name. And variable estimator above should have been svm_linear_estimator. – Bilal Dadanlar May 13 '13 at 08:53
  • 1
    i just realized that os.path.exists() is smarter than using try catch :) – Bilal Dadanlar May 21 '13 at 14:30