Deeplearning on Spark pipeline: How to predict using a neural network model in a pipeline?

Question

I am trying to add sentiment analysis program to Spark pipeline. When doing it, I have class which extends org.apache.spark.ml.PredictionModel. When extending this PredictionModel class, I have to override predict() method which predicts the label for given feature. But, I get either 0 or 1 all the time when I execute this code.For example, if there are 10 movie reviews, five are negative reviews and other five are negative, it classifies all reviews as negative. I have attached the code below.

import org.apache.spark.ml.PredictionModel;
import org.apache.spark.ml.param.ParamMap;
import org.apache.spark.mllib.linalg.DenseVector;
import org.apache.spark.mllib.linalg.Vector;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.nd4j.linalg.api.buffer.DataBuffer;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
import java.io.*;

//Model produced by a ProbabilisticClassifier
public class MovieReviewClassifierModel extends PredictionModel<Object, MovieReviewClassifierModel> implements  Serializable{


    private static final long serialVersionUID = 1L;
    private MultiLayerNetwork net;

    MovieReviewClassifierModel (MultiLayerNetwork net) throws Exception {
        this.net=net;
 }

    @Override
    public MovieReviewClassifierModel copy(ParamMap args0) {
        return null;
    }

    @Override
    public String uid() {
        return "MovieReviewClassifierModel";
    }


    public double raw2prediction(Vector rawPrediction) {//Given a vector of raw predictions, select the predicted label
        return rawPrediction.toArray()[0];
    }

    @Override
    public double predict(Object o) {

        int prediction=0;
        DenseVector v=(DenseVector)o;
        double[] a=v.toArray();
        INDArray arr=Nd4j.create(a);
        INDArray array= net.output(arr,false);
        DataBuffer ob = array.data();
        double[] d=ob.asDouble();
        double zeroProbability=d[0];
        double oneProbability=d[1];
        if (zeroProbability > oneProbability) {
            prediction=0;
        }
        else{
            prediction=1;

        }


        return prediction;
    }


}

Can you give me reasons for the wrong predictions?

score 0 · Answer 1 · answered Mar 08 '16 at 07:56

0

In public double predict(Object o) you have a following if statement:

if (zeroProbability > oneProbability) {
    prediction=0;
}
else{
    prediction=1;

}

which causes the return of 0 or 1. Change this method in order to have some other prediction values.

answered Mar 08 '16 at 07:56

user987339

10,519
8
40
45

The problem is that if we have a dataset which contains 10 movie reviews and 5 are positive and 5 are negative, and if we use 1 to indicate positive and 0 to indicate negative, then the model should predict both 1 and 0. But it predicts only 0 for all positive reviews and negative reviews. – Thamali Wijewardhana Mar 08 '16 at 08:44
Then you should introduce for instance 0.5 for predicting both 0 and 1. Change if statement acordingly. – user987339 Mar 08 '16 at 08:51

Deeplearning on Spark pipeline: How to predict using a neural network model in a pipeline?

1 Answers1