I am using Weka for machine learning. I would like to predict different behaviors using mulitlayer perceptron.Then I make a Min Max normalization and change the order of the data (Randomize). I did this for the whole data in weka (not programmed in java. the code given here is only an example how it would look like for the training data). Then I split the Data: 60% training data, 20% cross valid data and 20% test data. After that I create the multilayer percetron model:
public static void main(String[] args) throws Exception {
String filepath = "...Training60%.arff";
FileReader trainreader = new FileReader(filepath);
Instances train = new Instances(trainreader);
train.setClassIndex(train.numAttributes() - 1);
/**
* Min-Max Normalisierung der Attribute in den Testdaten auf die Werte zwischen
* 0 and 1
*/
Normalize normalize = new Normalize();
normalize.setInputFormat(train);
Instances normalizedData = Filter.useFilter(train, normalize);
FileWriter fwriter1 = new FileWriter(
"...OutputJavaNormalize.arff");
fwriter1.write(normalizedData.toString());
fwriter1.close();
System.out.println("Fertig");
/**
* Mischt die Reihenfolge der übergebenen Instanzen (Normalisierte Daten) nach
* dem Zufallsprinzip.
*/
Randomize randomize = new Randomize();
randomize.setInputFormat(normalizedData);
Instances randomizedData = Filter.useFilter(normalizedData, randomize);
FileWriter fwriter2 = new FileWriter(
"...OutputJavaRandomize.arff");
System.out.println("Ende");
fwriter2.write(randomizedData.toString());
fwriter2.close();
Then I create the mulitlayer perceptron model and do the cross validation:
/**
* MultilayerPerceptron model
*/
MultilayerPerceptron mlp = new MultilayerPerceptron();
// Setting Parameters
mlp.setLearningRate(0.1);
mlp.setMomentum(0.2);
mlp.setTrainingTime(2000);
mlp.setSeed(1);
mlp.setValidationThreshold(20);
mlp.setHiddenLayers("9");
mlp.buildClassifier(randomizedData);
weka.core.SerializationHelper.write(".../MLPa753",mlp);
System.out.println("ModelErstellt");
Instances datapredict = new Instances(new BufferedReader(new FileReader(
"...CrossValid_20%.arff")));
datapredict.setClassIndex(datapredict.numAttributes() - 1);
Evaluation eval = new Evaluation(randomizedData);
eval.crossValidateModel(mlp, datapredict, 5, new Random(1));
After that I load the test data and predict the value and probability for it and save it.
// Auswertung/Vorhersage von nicht markierten Daten (20% von gesamten Daten)
Instances datapredict1 = new Instances(new BufferedReader(new FileReader(
"D:...TestSet_20%.arff")));
datapredict1.setClassIndex(datapredict1.numAttributes() - 1);
Instances predicteddata1 = new Instances(datapredict1);
FileWriter fwriter11 = new FileWriter(
".../output.arff");
for (int i1 = 0; i1 < datapredict1.numInstances(); i1++) {
double clsLabel1 = mlp.classifyInstance(datapredict1.instance(i1));
predicteddata1.instance(i1).setClassValue(clsLabel1);
String s = train.instance(i1) + "," + clsLabel1;
fwriter11.write(s.toString());
System.out.println(train.instance(i1) + "," + clsLabel1);
}
fwriter11.close();
System.out.println(eval.toClassDetailsString());
System.out.println(eval.toMatrixString());
System.out.println(eval.toSummaryString()); // Summary of Training
System.out.println(Arrays.toString(mlp.getOptions()));
}
}
When I look at the Confusions matrix
the model looks quite ok. The overview looks like this:
That looks ok too. But in the output file where the predictions are stored, "Value1" is always predicted for all records. What is the reason for this? How can I change this?