I have 4 text files used to represent economy,politics,health,and sport categories.Each file contains 400 Arabic words and the frequency of each word which used to represent each category.
ex: health.txt contains
اصابة 113
6 غذائية
6 طبيعي . .
I used Simple CI to create arff. the output arff file is as the following: @relation C__finaloutput
@attribute text string
@attribute @@class@@ {economy,health,politics,sport}
@data
'إصابة 113\r\nغذائية 6\r\nطبيعي 6\r\nمريضا 6\r\n',health
.
.
problems are:1.how weka will recognize the number in arff file as the frequency of each word?
2.how to use SMO classifier or other classifiers like j48 which not handle string attributes?