2

Hye there I am new to this work and I am getting confused after searching about how to get through it! Actually i want to create a sparse ARFF file for weka for text classification! I have been searching online how to get start with it. My requirement is to generate a sparse arff file that should be compatible with the weka! The outline for the arff should be like:

 @relation myrelation
 @attribute att0 numeric
 @attribute att1 numeric
 @data
 {0,1,4,5 , A}
 {0,5,2,,1 B}

Such that I have some strings and then a class suppose my data set is as follow:

 string is a string A
 Hello a string B
 Another is string C
 .
 .
 .

first comes the string and then the class as A,B or C... So what i want is to convert my dataset into above mentioned sparse arff format. Can somebody give me a direction how can i do it? please I want to do it in java

Java Nerd
  • 958
  • 3
  • 19
  • 51

1 Answers1

2

You can use Weka's StringToWordVector filter to convert the text into a word vector (but not necessarily a sparse matrix). Take a look at my tutorial on this.

Rushdi Shams
  • 2,423
  • 19
  • 31