0

i have an arff file that is built with stringtowordvectorand contains features and it's TFIDF wight like this:

@relation 'sss'
-weka.filters.unsupervised.attribute.StringToWordVector-R-W100-prune-rate-1.0-C-T-I-N0-S-stemmerweka.core.stemmers.NullStemmer -tokenizerweka.core.tokenizers.WordTokenizer -delimiters \" ؟،؛\\r\\t\\n.,;:\\\'\\\"()?!-><#$\\\%&*+/@^_=[]{}|`~0123456789\"'


@attribute @@class@@ {mis,pol}
@attribute water numeric
@attribute start numeric
@attribute government numeric

{2 0.285724,6 0.338022,7 0.517187,8 0.164801,9 ...}
{7 1.191401,8 0.560813,9 0.904039,10 0.322267....}
..
....
{0 pol,6 1.276448,36 0.702977,...}

now i have a test folder that contain 2 class text.(such as train set:pol and mis ). and i want to classify this test and evaluate my train set.i know that for this purpose i should use batch filter so i read this link : http://weka.wikispaces.com/Use+WEKA+in+your+Java+code#Filter-Batch%20filtering based on this link , my test and train set should be in the same format (simple text format) .i don't know what should i do when my train set is in arff format and my test set is in text format.(i don't have train set in text files format )

MSepehr
  • 890
  • 2
  • 13
  • 36

1 Answers1

0

You can do the following:

  1. Take your previous training set file in ARFF format without applying the StringToWordVector filter.
  2. Generate a test set file using TextDirectoryToARFF.
  3. Now you have two ARFF files with text in plain format. Thus apply the StringToWordVector filter to both of them in batch mode.
  • my problem is :i don't have training set in text format,i just have arff file with mentioned format. – MSepehr Jan 15 '14 at 11:17
  • It seems strange you say that you do not have the training set in text format, as you are posting an ARFF file which header is an original relation named `'sss'` and after a `StringToWordVector` filter has been applied. In my answer I mean taking the original `@relation 'sss'` file. – Jose Maria Gomez Hidalgo Jan 16 '14 at 20:49
  • let me describe the problem :i have tree dataset in ARFF format that their attribute did not match(i asked about these problem in this link :http://stackoverflow.com/questions/21067439/how-to-match-attributes-order-of-two-instances-in-weka ).I built these ARFF file with 'StringToWordVector' **separately** .now i have an arff file in format that i describe.now i want to test an external test set ,but i have not the original text files.is that possible to do that? – MSepehr Jan 21 '14 at 13:25