0

I am using Weka from command line as I have a batch file to run multiple classifiers for 10 folds. So I run experiments 10 times and then take average of their metrics (accuracy etc). Now I have many output files so evaluating them manually is time consuming. My question is can we export the output predicted values from weka command line to a CSV in different columns. For example: java -cp weka.jar weka.classifiers.trees.J48 -t data/iris.arff -T data/iris.arff -p 0 > iris.txt give you text or even .CSV file but they are in one column. So we cannot find accuracy, precision etc. If we can have the predicted in one column and original in another so we can easily find other metrics. This is the most relevant but this has the same issue.. With this code java -cp weka.jar weka.classifiers.trees.J48 -t data/iris.arff -T data/iris.arff > iris.csv I can get detailed file but they are in one column as here: all metrics but in one column so taking average of them is not easy.

wasif khan
  • 27
  • 6

1 Answers1

1

The -p option has long been deprecated. Instead, use the -classifications class+options option (see javadoc of Evaluation class). If you want CSV output, you can the following class: weka.classifiers.evaluation.output.CSV.

BTW If you are thinking of comparing classifiers, you should use the Weka Experimenter. It performs 10 runs of 10-fold cross-validation by default and gives you mean, stdev and significant losses/wins for the selected statistic.

fracpete
  • 2,448
  • 2
  • 12
  • 17
  • Thank you very much for your help. Can we provide a separate test set in Experimenter? Because I am upsampling the training data and I don't want the artificially created samples to be used in the validation. i.e. I only want validation(testing results) on the real samples not artificially creates samples (upsampled). Thank you for your help. – wasif khan Jul 13 '21 at 05:53
  • Unfortunately, no. You could combine the two datasets and then figure out the exact percentage that you need to split it into train/test (for an order-preserving train/test split). Any randomization unfortunately would happen before the split, i.e., using multiple runs will be pointless (as you are preserving the order). – fracpete Jul 19 '21 at 00:35