My problem is I am training my FeedForward NN to predict how long it takes to train from one to another station. It has two hidden layers (128 and 64) and using TANH. First what doesnt make sense for me is why my model predict better on validating dataset. And at one point, loss starts oscillating.
I chcecked my data and they are different, no duplicates. Maybe it is because data are very similar like, same routes, same train types and thats the reason for this bahavior?
I am USING DL4J. And validation dataset is 10% of training set. And my dataset contains more than 130 000 rows (for this particular example). Edit: Here are values I am plotting.
for (int i = 0; i < nEpochs; i++) {
trainingSetIterator.reset();
model.fit(trainingSetIterator);
System.out.println(i + ": " + model.evaluateRegression(validationIterator).averagerootMeanSquaredError() + " || "
+ model.evaluateRegression(trainingSetIterator).averagerootMeanSquaredError());
validationValues[i] = model.evaluateRegression(validationIterator).averagerootMeanSquaredError();
trainValues[i] = model.evaluateRegression(trainingSetIterator).averagerootMeanSquaredError();
}
PlotRMSE plot = new PlotRMSE(trainValues, validationValues);
And here is my Config for neural net:
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)//pridane
.weightInit(WeightInit.XAVIER)
.dropOut(0.6)
.updater(new Adam(learningRate))
.list()
.layer(new DenseLayer.Builder().nIn(numInputs).nOut(numHiddenNodes1)
.activation(Activation.TANH).build())
.layer(new DenseLayer.Builder().nIn(numHiddenNodes1).nOut(numHiddenNodes2)
.activation(Activation.TANH).build())
.layer(new DenseLayer.Builder().nIn(numHiddenNodes2).nOut(numHiddenNodes2)
.activation(Activation.TANH).build())
.layer(new OutputLayer.Builder(LossFunctions.LossFunction.MSE)
.activation(Activation.IDENTITY)
.nIn(numHiddenNodes2).nOut(numOutputs).build()).backpropType(BackpropType.Standard)
.build();