I'm trying to train and run Multi-Class classifiers for Random Forest and Logistic Regression. As of now on my machine which has an 8GB RAM and an i5 core, it's taking quite some time to run inspite of the datasize being hardly 34K records. Is there any way in which i can speed up the current existing run time by tweaking a few parameters?
I'm just giving an example for the Logistic Regression Randomized Search below.
X.shape
Out[9]: (34857, 18)
Y.shape
Out[10]: (34857,)
Y.unique()
Out[11]: array([7, 3, 8, 6, 1, 5, 9, 2, 4], dtype=int64)
params_logreg={'C':[0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0],
'solver':['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],
'penalty':['l2'],
'max_iter':[100,200,300,400,500],
'multi_class':['multinomial']}
folds = 2
n_iter = 2
scoring= 'accuracy'
n_jobs= 1
model_logregression=LogisticRegression()
model_logregression = RandomizedSearchCV(model_logregression,X,Y,params_logreg,folds,n_iter,scoring,n_jobs)
[CV] solver=newton-cg, penalty=l2, multi_class=multinomial, max_iter=100, C=0.9
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[CV] solver=newton-cg, penalty=l2, multi_class=multinomial, max_iter=100, C=0.9, score=0.5663798049340218, total= 2.7min
[CV] solver=newton-cg, penalty=l2, multi_class=multinomial, max_iter=100, C=0.9
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 2.7min remaining: 0.0s
[CV] solver=newton-cg, penalty=l2, multi_class=multinomial, max_iter=100, C=0.9, score=0.5663625408848338, total= 4.2min
[CV] solver=sag, penalty=l2, multi_class=multinomial, max_iter=400, C=0.8
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 7.0min remaining: 0.0s
[CV] solver=sag, penalty=l2, multi_class=multinomial, max_iter=400, C=0.8, score=0.5663798049340218, total= 33.9s
[CV] solver=sag, penalty=l2, multi_class=multinomial, max_iter=400, C=0.8
[CV] solver=sag, penalty=l2, multi_class=multinomial, max_iter=400, C=0.8, score=0.5664773053308085, total= 26.6s
[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 8.0min finished```
It's taking about 8 mins to run for Logistic Regression. In contrast RandomForestClassifier takes only about 52 seconds.
Is there any way in which I can make this run faster by tweaking the parameters?