Why Classification model in weka predicting all instances as one class?

Question

I have built a classification model using weka.I have two classes namely {spam,non-spam} After applying stringtowordvector filter, I get 10000 attributes for 19000 records. Then I am using liblinear library to build model which gives me F-score as follows: Spam-94% non-spam-98%

When I use same model to predict new instances, it predict all of them as spam. Also, when I try to use test set same as training set, It predict all of them as spam too. I am mentally exhausted to find the problem.Any help will be appreciated.

score 0 · Answer 1 · answered May 19 '15 at 16:44

I get it also wrong every so often. Then I watch this video to remind myself how it's done: https://www.youtube.com/watch?v=Tggs3Bd3ojQ where Prof Witten, one of the Weka Developers/Architects shows how to use the FilteredClassifier (which in turn is configured to load the StringToWordVector Filter) on the training-dataset and the test-set correctly.

This is shown for weka 3.6, weka 3.7. might be slightly different.

What does ZeroR give you? If it's close to 100%, you know that any classification algorithm should be not too far off either.

Why do you optimize for F-Measure? Just asking. I have never used this and don't know much about it. (I would optimize for the "Precision" metric assuming you have much more Spam than Nonspam).

I found out the problem. I was not applying string to word vector filter while testing instances. — user2335004, May 20 '15 at 16:51

Why Classification model in weka predicting all instances as one class?

1 Answers1