I'm using Mallet 2.0.7 in java for mining of tweets. According the documentation, for topic modeling I have to read data set using CsvIterator.
Reader fileReader = new InputStreamReader(new FileInputStream(new File(args[0])), "UTF-8");
instances.addThruPipe(new CsvIterator (fileReader, Pattern.compile("^(\\S*)[\\s,]*(\\S*)[\\s,]*(.*)$"),
3, 2, 1)); // data, label, name fields
My data set is like: row,x,location,username,hashtaghs,text,retweets,date,favorites,numberOfComment
for label I added column x. in the first time, I want to run algorithm in column text (6) and later added another column. I wrote this pattern but it doesn't work as expected, It gets column 6 until last for data. how do I change the regular expression for pattern?
Reader fileReader = new InputStreamReader(new FileInputStream(new File(filePath)), "UTF-8");
instances.addThruPipe(new CsvIterator(fileReader,
Pattern.compile("^(\\S*)[\\s,]*(\\S*)[\\s,]*(\\S*)[\\s,]*(\\S*)[\\s,]*(\\S*)[\\s,]*(.*)$"),
6, 2, 1)); // data, label, name fields