weka: train and test set in different format (arff and text format)

Question

i have an arff file that is built with stringtowordvectorand contains features and it's TFIDF wight like this:

@relation 'sss'
-weka.filters.unsupervised.attribute.StringToWordVector-R-W100-prune-rate-1.0-C-T-I-N0-S-stemmerweka.core.stemmers.NullStemmer -tokenizerweka.core.tokenizers.WordTokenizer -delimiters \" ؟،؛\\r\\t\\n.,;:\\\'\\\"()?!-><#$\\\%&*+/@^_=[]{}|`~0123456789\"'


@attribute @@class@@ {mis,pol}
@attribute water numeric
@attribute start numeric
@attribute government numeric

{2 0.285724,6 0.338022,7 0.517187,8 0.164801,9 ...}
{7 1.191401,8 0.560813,9 0.904039,10 0.322267....}
..
....
{0 pol,6 1.276448,36 0.702977,...}

now i have a test folder that contain 2 class text.(such as train set:pol and mis ). and i want to classify this test and evaluate my train set.i know that for this purpose i should use batch filter so i read this link : http://weka.wikispaces.com/Use+WEKA+in+your+Java+code#Filter-Batch%20filtering based on this link , my test and train set should be in the same format (simple text format) .i don't know what should i do when my train set is in arff format and my test set is in text format.(i don't have train set in text files format )

score 0 · Answer 1 · answered Jan 15 '14 at 07:07

0

You can do the following:

Take your previous training set file in ARFF format without applying the StringToWordVector filter.
Generate a test set file using TextDirectoryToARFF.
Now you have two ARFF files with text in plain format. Thus apply the StringToWordVector filter to both of them in batch mode.

answered Jan 15 '14 at 07:07

Jose Maria Gomez Hidalgo

1,061
6
5

my problem is :i don't have training set in text format,i just have arff file with mentioned format. – MSepehr Jan 15 '14 at 11:17
It seems strange you say that you do not have the training set in text format, as you are posting an ARFF file which header is an original relation named `'sss'` and after a `StringToWordVector` filter has been applied. In my answer I mean taking the original `@relation 'sss'` file. – Jose Maria Gomez Hidalgo Jan 16 '14 at 20:49
let me describe the problem :i have tree dataset in ARFF format that their attribute did not match(i asked about these problem in this link :http://stackoverflow.com/questions/21067439/how-to-match-attributes-order-of-two-instances-in-weka ).I built these ARFF file with 'StringToWordVector' **separately** .now i have an arff file in format that i describe.now i want to test an external test set ,but i have not the original text files.is that possible to do that? – MSepehr Jan 21 '14 at 13:25

weka: train and test set in different format (arff and text format)

1 Answers1