0

So I have 3 dataset that I used for sentiment analysis and I want to use only 1 dataset for building the model and the rest of the dataset for testing purpose. The model that I will use is SVM(SMO algoritm). The datasets at start only have 2 attributes (text,label) but after preprocessing with string to wordvector it become many attributes. I was able to build a model and test it using 10-fold cross validation and now I want to test it with the other dataset. But since it has different attributes due to string to word vector I can't do it. Any solution for my problem?

I already applied the same preprocess to the test set and tried using "inputmappedclassifier" but the result is still error

I was hoping the model can be used on datasets that it never see

Wannabepro
  • 27
  • 1
  • 6

1 Answers1

0

See http://jmgomezhidalgo.blogspot.com/2013/05/mapping-vocabulary-from-train-to-test.html

If you know both train and test data you can use batch filtering.

If you don't know test data then you can use FilteredClassfier method. Check http://jmgomezhidalgo.blogspot.com/2013/01/text-mining-in-weka-chaining-filters.html and http://jmgomezhidalgo.blogspot.com/2013/04/a-simple-text-classifier-in-java-with.html

Also have a look at How to use StringToWordVector (weka) in java?

hkn
  • 1,453
  • 19
  • 21