I am trying to add sentiment analysis program to Spark pipeline. When doing it, I have class which extends org.apache.spark.ml.PredictionModel
. When extending this PredictionModel
class, I have to override predict()
method which predicts the label for given feature. But, I get either 0 or 1 all the time when I execute this code.For example, if there are 10 movie reviews, five are negative reviews and other five are negative, it classifies all reviews as negative. I have attached the code below.
import org.apache.spark.ml.PredictionModel;
import org.apache.spark.ml.param.ParamMap;
import org.apache.spark.mllib.linalg.DenseVector;
import org.apache.spark.mllib.linalg.Vector;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.nd4j.linalg.api.buffer.DataBuffer;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
import java.io.*;
//Model produced by a ProbabilisticClassifier
public class MovieReviewClassifierModel extends PredictionModel<Object, MovieReviewClassifierModel> implements Serializable{
private static final long serialVersionUID = 1L;
private MultiLayerNetwork net;
MovieReviewClassifierModel (MultiLayerNetwork net) throws Exception {
this.net=net;
}
@Override
public MovieReviewClassifierModel copy(ParamMap args0) {
return null;
}
@Override
public String uid() {
return "MovieReviewClassifierModel";
}
public double raw2prediction(Vector rawPrediction) {//Given a vector of raw predictions, select the predicted label
return rawPrediction.toArray()[0];
}
@Override
public double predict(Object o) {
int prediction=0;
DenseVector v=(DenseVector)o;
double[] a=v.toArray();
INDArray arr=Nd4j.create(a);
INDArray array= net.output(arr,false);
DataBuffer ob = array.data();
double[] d=ob.asDouble();
double zeroProbability=d[0];
double oneProbability=d[1];
if (zeroProbability > oneProbability) {
prediction=0;
}
else{
prediction=1;
}
return prediction;
}
}
Can you give me reasons for the wrong predictions?