Non-zero bias parameter in scikit-learn decreases classification quality

Asked May 07 '13 at 11:07

Active Jun 24 '15 at 20:09

Viewed 245 times

I'm using scikit-learn's LinearSVC as a statistical classifier in text classification. My features are uncentered tf-idf.

When the fit_intercept attribute is set to False, classification accuracy increases significantly, which contradicts the expectation that the absolute values of features does not impact the performance of a statistical classifier.

What can cause the change in classification accuracy I am observing?

edited Jun 24 '15 at 20:09

Gyan Veda

6,309
11
41
66

asked May 07 '13 at 11:07

lizarisk

7,562
10
46
70

Are you in a multi-class or binary setting? Also, you are talking about generalization, i.e. test performance, right? The training performance definitely should be better with intercept. – Andreas Mueller May 07 '13 at 14:49
I't a multilabel set (handled by OneVsRestClassifier). I'm talking about test performance of course. – lizarisk May 07 '13 at 15:51
Can you check the training performance? The only explanation that comes to my mind is that there is an imbalance between label presents and absence in the training set, but not in the test-set. – Andreas Mueller May 07 '13 at 18:15
The training performance is very good in both cases (with and without intercept). The imbalance is not likely to be the reason because the training and testing sets were obtained randomly from one bigger set. – lizarisk May 07 '13 at 18:50

Non-zero bias parameter in scikit-learn decreases classification quality

0 Answers0