unable to upload CSV file for WEKA analysis - java

Question

I am working on a big data analysis project and i am stuck at this point I am trying to upload a CSV file with data and want to use WEKA java API to perform the analysis. I am looking to tokenize the text, remove stop words, identify pos and filter the nouns I have no idea why I am seeing this error. Explanation and Solution for this would be great ! But i see the below error

Error: 

   Exception in thread "main" java.io.IOException: wrong number of values. Read 21, expected 20, read Token[EOL], line 3
     at weka.core.converters.ConverterUtils.errms(ConverterUtils.java:912)
     at weka.core.converters.CSVLoader.getInstance(CSVLoader.java:819)
     at weka.core.converters.CSVLoader.getDataSet(CSVLoader.java:642)

Code :

CSVLoader loader = new CSVLoader();
loader.setSource(new File("C:\\fakepath\\CSVfilesample.csv"));
Instances data = loader.getDataSet();

// save ARFF
ArffSaver saver = new ArffSaver();
saver.setInstances(data);
saver.setFile(new File("C:\\fakepath\\CSVfilesample.arff"));
saver.setDestination(new File("C:\\fakepath\\CSVfilesample.arff"));
saver.writeBatch();

BufferedReader br=null;
br=new BufferedReader(new FileReader("C:\\fakepath\\CSVfilesample.arff"));
Instances train=new Instances(br);
train.setClassIndex(train.numAttributes()-1);
br.close();
NaiveBayes nb=new NaiveBayes();
nb.buildClassifier(train);
Evaluation eval=new Evaluation(train);
eval.crossValidateModel(nb, train, 10, new Random(1));
System.out.println(eval.toSummaryString("\nResults\n=====\n",true));
System.out.println(eval.fMeasure(1)+" "+eval.precision(1)+" "+eval.recall(1));

score 7 · Accepted Answer · edited Sep 16 '13 at 05:07

This error is generally caused by incorrect format while loading a certain ARFF file. There a few reasons. Check the following points:

It is practice to use ARFF format instead of a CSV because it has certain advantages over a CSV file. Check Can I use CSV.?
Now for the other part, check if the encoding of the file is UTF-8. If it is you will have to decode the file using UTF 8 format. Refernces : Text Categorization with WEKA
Thirdly check if there are some incompatible characters in your CSV. Like a %2 or something like that. Check for syntactically incorrect endings. Check for any extra commas.

This error tells you that there is problem with the file contents. They don't follow WEKA standard format. Fix that and the error will disappear.

Hope it helps. :)

unable to upload CSV file for WEKA analysis - java

1 Answers1