1

How can I make a classification model by 10-fold cross-validation using Weka API? I ask this, because each cross-validation's run a new classification model is created. Which classification model should I use in my test data?

Jason Aller
  • 3,541
  • 28
  • 38
  • 38

2 Answers2

9

10-fold cross validation is used to get an estimate of a classifier's accuracy should that classifier be constructed from all of the training data. It is used when it is felt that there is not enough data for an independent test set. This means that you should build a new model from all the training data when you go to predict future data. The result from 10-fold cross validation is a guess as to how well your new classifier should perform.

The following code shows an example of using Weka's cross-validation through the API, and then building a new model from the entirety of the training dataset.

    //Training instances are held in "originalTrain"

    Classifier c1 = new NaiveBayes();
    Evaluation eval = new Evaluation(originalTrain);
    eval.crossValidateModel(c1, originalTrain, 10, new Random(1));
    System.out.println("Estimated Accuracy: "+Double.toString(eval.pctCorrect()));

    //Train a new classifier
    Classifier c2 = new NaiveBayes();
    c2.buildClassifier(originalTrain)  //predict with this model
Walter
  • 2,811
  • 2
  • 21
  • 23
  • do you know how to get the results for each fold? Say I want to get all ten `pctCorrect()` for each fold on an `Evaluation` object applied with `.crossValidateModel()`. –  Jan 30 '14 at 14:30
  • I am not sure how to do this. I saw your other [question](http://stackoverflow.com/questions/21458923/get-results-of-cross-validation-per-fold), and it looks like you are on the right track by trying to do the cross validation on your own. – Walter Feb 03 '14 at 13:52
  • Oh that's okay, and I appreciate that you OK'd my method :D Thank you for the reply! –  Feb 03 '14 at 14:09
0

Perform cross-validation with:

// perform cross-validation             
    for (int n = 0; n < folds; n++) {
        //Evaluation eval = new Evaluation(randData);
        //get the folds       
        Instances train = randData.trainCV(folds, n);
        Instances test = randData.testCV(folds, n);       

        ArffSaver saver = new ArffSaver();
        saver.setInstances(train);
        System.out.println("No of folds done = " + (n+1));

    saver.setFile(new File("C:\\\\Users\\AmeerSameer\\Desktop\\mytrain.arff"));
        saver.writeBatch();
        //if(n==9)
        //{System.out.println("Training set generated after the final fold is");
        //System.out.println(train);}

        ArffSaver saver1 = new ArffSaver();
        saver1.setInstances(test);
        saver1.setFile(new File("C:\\Users\\AmeerSameer\\Desktop\\mytest.arff"));
        saver1.writeBatch();
Greenonline
  • 1,330
  • 8
  • 23
  • 31