convert my text file that contains the words and their frequencies to arff file suitable for weka

Question

I have 4 text files used to represent economy,politics,health,and sport categories.Each file contains 400 Arabic words and the frequency of each word which used to represent each category.

ex: health.txt contains

اصابة 113

6 غذائية

6 طبيعي . .

I used Simple CI to create arff. the output arff file is as the following: @relation C__finaloutput

@attribute text string

@attribute @@class@@ {economy,health,politics,sport}

@data

'إصابة 113\r\nغذائية 6\r\nطبيعي 6\r\nمريضا 6\r\n',health

.

problems are:1.how weka will recognize the number in arff file as the frequency of each word?

2.how to use SMO classifier or other classifiers like j48 which not handle string attributes?

I do not understand what is your problem. We have not seen your code, so we cannot find the problem. It seems you are not sure what you want to do. I guess you already know the arff file format (quite simple). Where did you get stuck? (Asking for external resources -like tools- is off-topic on SO. Asking for suggestions on best file formats (and other things) is also off-topic.) — Gábor Bakos, May 26 '15 at 00:16

score 0 · Answer 1 · answered May 26 '15 at 01:31

0

Weka can load CSV files from the Explorer "open file" dialog, from the command line, or in code. Above, your file contains either a space or a tab as a delimiter, not a comma, but CSVLoader can handle that too. See the -F option described in the docs for CSVLoader. Or, you could just convert the spaces (or tabs) to commas using a variety of techniques like sed -e 's/ /,/ health.txt > health.csv.

answered May 26 '15 at 01:31

Oliver Dain

9,617
3
35
48

i did it using command line to convert texts to arff file but 1.how weka will recognize the number in arff file as the frequency of each word? 2.how to use SMO classifier or other classifiers like j48 which not handle string attributes? – In2015 May 26 '15 at 11:22

convert my text file that contains the words and their frequencies to arff file suitable for weka

1 Answers1