scikit-learn refit/partial fit option in Classifers

Question

I am wondering is there any option in sklearn classifiers to fit using some hyperparameters and after changing a few hyperparameter(s), refit the model by saving computation (fit) cost.

Let us say, Logistic Regression is fit using C=1e5 (logreg=linear_model.LogisticRegression(C=1e5)) and we change only C to C=1e3. I want to save some computation because only one parameter is changed.

As far as I know, bayesian hyperparameter optimization is the fastest method: https://github.com/fmfn/BayesianOptimization. You can create a new question for this, maybe people will have better ideas. — Mohamed Ali JAMAOUI, Aug 11 '17 at 20:05
Bayesian hyperparameter optimization often has the problem that it has more hyperparameters than the model you're trying to tune. Here is an interesting alternative: http://blog.dlib.net/2017/12/a-global-optimization-algorithm-worth.html — Matti Wens, May 30 '19 at 12:13

Mohamed Ali JAMAOUI · Accepted Answer · 2018-05-22T18:53:56.597

Yes, there is a technique called warm_start which, citing from the documentation, means:

warm_start : bool, default: False
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver.

As described in the documentation here, it's available in LogisticRegression :

sklearn.linear_model.LogisticRegression(..., warm_start=False, n_jobs=1)

So concretely, for your case you would do the following:

from sklearn.linear_model import LogisticRegression 

# create an instance of LogisticRegression with warm_start=True
logreg = LogisticRegression(C=1e5, warm_start=True)
# you can access the C parameter's value as follows
logreg.C 
# it's set to 100000.0

# .... 
# train your model here by calling logreg.fit(..)
# ....

# reset the value of the C parameter as follows 
logreg.C = 1e3 

logreg.C 
# now it's set to 1000.0

# .... 
# re-train your model here by calling logreg.fit(..)
# ....

As far as I have been able to check quickly, it's available also in the following:

Thanks. Actually, I am experimenting to search best parameters (as an alternative to Grid Search in sklearn). Theoretical, my stance is that if we change only a single parameter and `fit` the model again, it should take less time than the first time (if `warm_start=True`), but empirically it is not true. Can you please help? — Techie Fort, Aug 11 '17 at 20:03

scikit-learn refit/partial fit option in Classifers

1 Answers1