Using WEKA Filters in Java - Oversampling and Undersampling

Question

I'm having an issue with finding out how to use WEKA filters in the java code. I've looked up help but it seems a little dated as I'm using WEKA 3.8.5 . I'm doing 3 test. Test 1: No Filter, Test 2: weka.filters.supervised.instance.SpreadSubsample -M 1.0 , and Test 3: weka.filters.supervised.instance.Resample -B 1.0 -Z 130.3.

If my research is correct I should import the filters like this. Now I'm lost on having "-M 1.0 " for SpreadSample(my under sampling Test) and "-B 1.0 -Z 130.3." for Resample(My oversampling test).

import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.supervised.instance.Resample; 
import weka.filters.supervised.instance.SpreadSubsample;

And I have Test 1(my no filter Test) coded below

import java.io.FileReader;
import java.util.Random;
import weka.classifiers.Evaluation;
import weka.classifiers.trees.J48;
import weka.core.Instances;


public class Fraud {
    public static void main(String args[])
    {
  
        try {
  
            // Creating J48 classifier for the  tree
            J48 j48Classifier = new J48();
  
            // Setting the path for the dataset
            String FraudDataset = "C:\\Users\\Owner\\Desktop\\CreditCard\\CreditCard.arff";
            BufferedReader bufferedReader
            = new BufferedReader(
                new FileReader(FraudDataset));
            
            

        // Creating the data set instances
        Instances datasetInstances
            = new Instances(bufferedReader);

  
        datasetInstances.setClassIndex(
            datasetInstances.numAttributes() - 1);

        Evaluation evaluation
            = new Evaluation(datasetInstances);

        // Cross Validate Model. 10 Folds
        evaluation.crossValidateModel(
            j48Classifier, datasetInstances, 10,
            new Random(1));
        System.out.println(evaluation.toSummaryString(
            "\nResults", false));
        
        
        
    }

    // Catching exceptions
    catch (Exception e) {
        System.out.println("Error Occured!!!! \n"
                           + e.getMessage());
    }


    System.out.print("DT Successfully executed.");
}
    
}

The results of my code is:
Results
Correctly Classified Instances      284649               99.9445 %
Incorrectly Classified Instances       158                0.0555 %
Kappa statistic                          0.8257
Mean absolute error                      0.0008
Root mean squared error                  0.0232
Relative absolute error                 24.2995 %
Root relative squared error             55.9107 %
Total Number of Instances           284807     

DT Successfully executed.

Does anyone have an idea on how I can add the filters and the settings I want for the filters to the code for Test 2 and 3? Any help will be appreciated. I will run the 3 tests multiple times and compare the results. I want to see what works best of the 3.

What documentation are you referring to as *outdated*? Weka's API has been stable for a long time (in other words, it doesn't break code all the time!). Documentation on using the API is either available from the Weka manual that comes with your Weka installation or from the [Weka wiki](https://waikato.github.io/weka-wiki/using_the_api/). Your Weka installation also has an archive with code examples. — fracpete, Dec 06 '21 at 19:54
I reviewed other stackoverflow questions and I saw a few comments say it was outdated. Maybe I was wrong. — Damon Green, Dec 09 '21 at 17:52

score 1 · Answer 1 · answered Dec 06 '21 at 20:04

-M 1.0 and -B 1.0 -Z 130.3 are the options that you supply to the filters from the command-line. These filters implement the weka.core.OptionHandler interface, which offers the setOptions and getOptions methods.

For example, SpreadSubsample can be instantiated like this:

import weka.filters.supervised.instance.SpreadSubsample;
import weka.core.Utils;
...
SpreadSubsample spread = new SpreadSubsample();
// Utils.splitOptions generates an array from an option string
spread.setOptions(Utils.splitOptions("-M 1.0"));
// alternatively:
// spread.setOptions(new String[]{"-M", "1.0"});

In order to apply the filters, you should use the FilteredClassifier approach. E.g., for SpreadSubsample you would do something like this:

import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.filters.supervised.instance.SpreadSubsample;
import weka.core.Utils;
...
// base classifier
J48 j48 = new J48();
// filter
SpreadSubsample spread = new SpreadSubsample();
spread.setOptions(Utils.splitOptions("-M 1.0"));
// meta-classifier
FilteredClassifier fc = new FilteredClassifier();
fc.setFilter(spread);
fc.setClassifier(j48);

And then evaluate the fc classifier object on your dataset.

Thanks! This is my attempt to do it with the ReSample approach. ` Resample resam = new Resample(); resam.setOptions(Utils.splitOptions("-B 1.0 -Z 130.3")); // meta-classifier FilteredClassifier rc = new FilteredClassifier(); rc.setFilter(resam); rc.setClassifier(j48Classifier); ` Then I evaluated it with evaluation.crossValidateModel( rc, datasetInstances, 10, new Random(1)); System.out.println(evaluation.toSummaryString( "\nResults2", false)); — Damon Green, Dec 09 '21 at 17:48

Using WEKA Filters in Java - Oversampling and Undersampling

1 Answers1