Python: Logistic regression max_iter parameter is reducing the accuracy

Question

I am doing multiclass/multilabel text classification. I trying to get rid of the "ConvergenceWarning".

When I tuned the max_iter from default to 4000, the warning is disappeared. However, my model accuracy is reduced from 78 to 75.

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score


logreg = Pipeline([('vect', CountVectorizer()),
            ('tfidf', TfidfTransformer()),
            ('clf', LogisticRegression(n_jobs=1, C=1e5, solver='lbfgs',multi_class='ovr' ,random_state=0, class_weight='balanced' )),
           ])
logreg.fit(X_train, y_train)


y_pred = logreg.predict(X_test)

print('Logistic Regression Accuracy %s' % accuracy_score(y_pred, y_test))

cv_score = cross_val_score(logreg, train_tfidf, y_train, cv=10, scoring='accuracy')
print("CV Score : Mean : %.7g | Std : %.7g | Min : %.7g | Max : %.7g" % (np.mean(cv_score),np.std(cv_score),np.min(cv_score),np.max(cv_score)))

Why my accuracy is reducing when max_iter =4000? Is there any other way to fix * "ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning)" *

score 4 · Answer 1 · answered Jan 05 '20 at 19:36

It's missing the data used in the question so it's not possible to reproduce the problem but just guess.

Some things to check:

1) Many estimators such as LogisticRegression likes (not to say requires) scaled data. Depending on your data, you may want to scale with MaxAbsScaler, MinMaxScaler, StandardScaler or RobustAScaler. The optimal choice depends on the kind of problem you are trying to solve, data properties like sparsity, whether negative values are welcomed by the downstream estimator, etc. Scaling data usually speeds up convergence, that may even not require to increase max_iter.

2) In my experience, solver not "liblinear" requires more max_iter iterations to converge given the same input data.

3) I didn't see any 'max_iterset in your code snippet. It currently defaults to100` (sklearn 0.22).

4) I saw you set the the regularization parameter C=100000. It's drastically reduce the regularization, as C is the inverse of regularization strength. It's expected to consume more iterations and may lead to overfit the model.

5) I didn't expect that a higher max_iter would get you lower accuracy. The solver is diverging rather than converging. The data may not be scaled or the random state is not fixed or the tolerance tol (defaults 1e-4) became to high.

6) Check you cross_val_score cross-validation parameter cv. If I'm not wrong, the default behavior doesn't set the random state which result in variable mean accuracy.

score 0 · Answer 2 · edited Oct 30 '21 at 03:50

0

In my case, I increased the max_iter by small increments (from default 100 to 400 first and then intervals of 400) till I got rid of the warning. And, interestingly it increased the model performance parameters (Accuracy, Precision, Recall, F1 Score). Intuitively that makes sense as now the convergence happens and you reach the optimal solution vs. in the earlier case you weren't.

edited Oct 30 '21 at 03:50

Dharman

30,962
25
85
135

answered Oct 30 '21 at 03:45

Amit Govind Sharma

15
1
6

Python: Logistic regression max_iter parameter is reducing the accuracy

2 Answers2