7

I am working on Weka and need to output the predication values (probabilities) of each labels for each test instance.

In GUI there is an option in classify tab as (classify -> options -> Output predicted value) which does this work by outputting the prediction probabilities for each label but how to do this in java code. I want to receive probability scores for each label after classifying it ?

Kashif Khan
  • 301
  • 6
  • 17

3 Answers3

12

The following code takes in a set of training instances, and outputs the predicted probability for a specific instance.


import weka.classifiers.trees.J48;
import weka.core.Instances;

public class Main {

    public static void main(String[] args) throws Exception
    {
        //load training instances
        Instances test=...

        //build a J48 decision tree
        J48 model=new J48(); 
        model.buildClassifier(test);

        //decide which instance you want to predict
        int s1=2;

        //get the predicted probabilities 
        double[] prediction=model.distributionForInstance(test.get(s1));

        //output predictions
        for(int i=0; i<prediction.length; i=i+1)
        {
            System.out.println("Probability of class "+
                                test.classAttribute().value(i)+
                               " : "+Double.toString(prediction[i]));
        }

    }

}

The method "distributionForInstance" only works for classifiers capable of outputting distribution predictions. You can read up on it here.

Walter
  • 2,811
  • 2
  • 21
  • 23
  • 1
    Thanx @walter, i really appreciate your help .... Is there a way to get same dimension vector for test documents as i can get for training sample by using training vocabulary for test samples vua StringtoVector functionality in weka ? – Kashif Khan Dec 17 '13 at 07:55
  • 1
    I'm not familiar with text mining in weka, so I'm not well equipped to answer your question. You may try posting a new question in stack overflow. – Walter Dec 17 '13 at 17:14
  • 1
    Also, if this has answered your question, its customary to click the check mark under the "up and down" arrows (so that it turns green). This lets people know the question has been answered, and it gives me some credit for successfully answering your question. I only mention it because it looks like you are new to stack overflow. – Walter Dec 17 '13 at 17:14
1

I think I found the solution.

The training set and the test set must be equal: same header, same name of attributes, same order. Only changes the numbers. And the question is: why do I have to put the class in the test set if I don’t know it, and precisely it is what I want to obtain? It seems that the method needs something on that, but when you have a look at classModel.distributionForInstance(dataModel.instance(0)) , it gives you the prediction on your classes with an array of double. So, it is necessary to put some values of the classes in the test set, and later the ‘distributionForInstance’ gives you the real result for your classes.

Txus Lopez
  • 85
  • 10
0

From WEKA GUI, Classify panel -> press the "More options..." -> Output predictions -> Choose "PlainText" option. Now, left-click on "PlainText" and turn the "outputDistribution" into "True".

Note that, this process can be performed in last WEKA versions, e.g., WEKA 3.8.0.

Regards,
Martin

Martin L
  • 11
  • 1