0

I'm using scikit-learn's LinearSVC as a statistical classifier in text classification. My features are uncentered tf-idf.

When the fit_intercept attribute is set to False, classification accuracy increases significantly, which contradicts the expectation that the absolute values of features does not impact the performance of a statistical classifier.

What can cause the change in classification accuracy I am observing?

Gyan Veda
  • 6,309
  • 11
  • 41
  • 66
lizarisk
  • 7,562
  • 10
  • 46
  • 70
  • Are you in a multi-class or binary setting? Also, you are talking about generalization, i.e. test performance, right? The training performance definitely should be better with intercept. – Andreas Mueller May 07 '13 at 14:49
  • I't a multilabel set (handled by OneVsRestClassifier). I'm talking about test performance of course. – lizarisk May 07 '13 at 15:51
  • Can you check the training performance? The only explanation that comes to my mind is that there is an imbalance between label presents and absence in the training set, but not in the test-set. – Andreas Mueller May 07 '13 at 18:15
  • The training performance is very good in both cases (with and without intercept). The imbalance is not likely to be the reason because the training and testing sets were obtained randomly from one bigger set. – lizarisk May 07 '13 at 18:50

0 Answers0