1

I am trying to use OneCsRestClassifier on my data set. I extracted the features on which model will be trained and fitted Linear SVC on it. After model fitting, when I try to predict on the same data on which the model was fitted, I get all zeros. Is it because of some implementation issues or because my feature extraction is not good enough. I think since I am predicting on the same data on which my model was fitted I should get 100% accuracy. But instead my model predicts all zeros. Here is my code-

#arrFinal contains all the features and the labels. Last 16 columns are labels and features are from 1 to 521. 17th column from the last is not taken
X=np.array(arrFinal[:,1:-17])
X=X.astype(float)
Xtest=np.array(X)

Y=np.array(arrFinal[:,522:]).astype(float)
clf = OneVsRestClassifier(SVC(kernel='linear'))
clf.fit(X, Y)
ans=clf.predict(Xtest) 
print(ans)
print("\n\n\n")

Is there something wrong with my implementation of OneVsRestClassifier?

  • 1
    Try parameter tuning. See if `SVC(kernel='linear', C=10000)` gives you different results. See http://stackoverflow.com/questions/34475245/sklearn-svm-svr-and-svc-getting-the-same-prediction-for-every-input/34475451#34475451 – David Maust Dec 27 '15 at 05:08
  • That is really weird. With a sufficiently high C, it should produce exactly what was given to it, unless the features are exactly the same or there is truly 0 correlation. I feel like we're missing something. Could you provide the data? – David Maust Dec 27 '15 at 05:26
  • http://pastie.org/private/jeusjl8nfna0vlelzbnbhq –  Dec 27 '15 at 05:29
  • That makes sense. Your `X` values are too small. Try feature scaling, or increase your `C` by a whole lot. – David Maust Dec 27 '15 at 05:32
  • Thanks it worked. Actually I was normalizing my X before predict. I removed the normalization and it worked. –  Dec 27 '15 at 05:38
  • 1
    Excellent. I'm adding an answer with a StandardScaler example. It might be useful for you. – David Maust Dec 27 '15 at 05:40

1 Answers1

1

After looking at your data, it appears the values may be too small for the C value. Try using a sklearn.preprocessing.StandardScaler.

X=np.array(arrFinal[:,1:-17])
X=X.astype(float)
scaler = StandardScaler()
X = scaler.fit_transform(X)
Xtest=np.array(X)

Y=np.array(arrFinal[:,522:]).astype(float)
clf = OneVsRestClassifier(SVC(kernel='linear', C=100))
clf.fit(X, Y)
ans=clf.predict(Xtest) 
print(ans)
print("\n\n\n")

From here, you should look at parameter tuning on the C using cross validation. Either with a learning curve or using a grid search.

David Maust
  • 8,080
  • 3
  • 32
  • 36