I am trying to use the encog library as a function approximator for a reinforcement learning problem. To be more precise, I am trying to get a multi layer perceptron (BasicNetwork) up and running. Since my agent will somehow explore the world based on whatever RL-algorithm I chose I cannot prebuild any BasicNeuralDataSet as shown in the XOR example. Probably, I have to use the pause() and resume() functions but since I cannot find any documentation or examples on these I am somewhat lost in how to use these features (if they even work in my version. I'm not quite sure after reading the answer to the question in the second link).
I am using Java and the encog-core-2.5.3 jar. My current approach looks like this:
BasicNetwork network = new BasicNetwork();
network.addLayer(new BasicLayer(null, true,2));
network.addLayer(new BasicLayer(new ActivationTANH(), true,4));
network.addLayer(new BasicLayer(new ActivationTANH(), true,1));
network.getStructure().finalizeStructure();
network.reset();
TrainingContinuation cont = null;
double error = 0;
do {
int rnd = random.nextInt(trainInputs.length);
NeuralDataSet trainingSet = new BasicNeuralDataSet(
new double[][] { trainInputs[rnd] },
new double[][] { trainOutputs[rnd] });
Backpropagation train = new Backpropagation(network, trainingSet);
// train the neural network
if (cont != null) {
train.resume(cont);
}
train.iteration();
cont = train.pause();
error = train.getError();
} while (error > 0.01);
This is obviously a minimal example where I am just drawing random datapoints from a toy sample (XOR). What happens is that the MLP does not converge. Logging is showing me completely random errors so I assume that somewhat the trainer is being reset and that my pause/resume approach is not correctly implemented.
P.S.: Since I am not bound to Encoq but can use any framework there is I also appreciate sample code that fulfills my requirements. So far I tried Weka and Neuroph but both seem to lack real online learning where one can just trigger the training whenever a new sample is available (It has to be possible to classify samples during any time as well)