I'm trying to create a simple LSTM using DeepLearning4J, with 2 input features and a timeseries length of 1. I'm having a strange issue however; after training the network, inputting test data yields the same, arbitrary result regardless of the input values. My code is shown below.
(UPDATED)
public class LSTMRegression {
public static final int inputSize = 2,
lstmLayerSize = 4,
outputSize = 1;
public static final double learningRate = 0.0001;
public static void main(String[] args) {
int miniBatchSize = 99;
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.miniBatch(false)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.updater(new Adam(learningRate))
.list()
.layer(0, new LSTM.Builder().nIn(inputSize).nOut(lstmLayerSize)
.weightInit(WeightInit.XAVIER)
.activation(Activation.TANH).build())
// .layer(1, new LSTM.Builder().nIn(lstmLayerSize).nOut(lstmLayerSize)
// .weightInit(WeightInit.XAVIER)
// .activation(Activation.SIGMOID).build())
// .layer(2, new LSTM.Builder().nIn(lstmLayerSize).nOut(lstmLayerSize)
// .weightInit(WeightInit.XAVIER)
// .activation(Activation.SIGMOID).build())
.layer(1, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MSE)
.weightInit(WeightInit.XAVIER)
.activation(Activation.IDENTITY)
.nIn(lstmLayerSize).nOut(outputSize).build())
.backpropType(BackpropType.TruncatedBPTT)
.tBPTTForwardLength(miniBatchSize)
.tBPTTBackwardLength(miniBatchSize)
.build();
final var network = new MultiLayerNetwork(conf);
final DataSet train = getTrain();
final INDArray test = getTest();
final DataNormalization normalizer = new NormalizerMinMaxScaler(0, 1);
// = new NormalizerStandardize();
normalizer.fitLabel(true);
normalizer.fit(train);
normalizer.transform(train);
normalizer.transform(test);
network.init();
for (int i = 0; i < 100; i++)
network.fit(train);
final INDArray output = network.output(test);
normalizer.revertLabels(output);
System.out.println(output);
}
public static INDArray getTest() {
double[][][] test = new double[][][]{
{{20}, {203}},
{{16}, {183}},
{{20}, {190}},
{{18.6}, {193}},
{{18.9}, {184}},
{{17.2}, {199}},
{{20}, {190}},
{{17}, {181}},
{{19}, {197}},
{{16.5}, {198}},
...
};
INDArray input = Nd4j.create(test);
return input;
}
public static DataSet getTrain() {
double[][][] inputArray = {
{{18.7}, {181}},
{{17.4}, {186}},
{{18}, {195}},
{{19.3}, {193}},
{{20.6}, {190}},
{{17.8}, {181}},
{{19.6}, {195}},
{{18.1}, {193}},
{{20.2}, {190}},
{{17.1}, {186}},
...
};
double[][] outputArray = {
{3750},
{3800},
{3250},
{3450},
{3650},
{3625},
{4675},
{3475},
{4250},
{3300},
...
};
INDArray input = Nd4j.create(inputArray);
INDArray labels = Nd4j.create(outputArray);
return new DataSet(input, labels);
}
}
Here's an example of the output:
(UPDATED)
00:06:04.554 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.554 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
[[[3198.1614]],
[[2986.7781]],
[[3059.7017]],
[[3105.3828]],
[[2994.0127]],
[[3191.4468]],
[[3059.7017]],
[[2962.4341]],
[[3147.4412]],
[[3183.5991]]]
So far I've tried tried changing a number of hyperparameters, including the updater (previously Adam), the activation function in the hidden layers (previously ReLU), and the learning rate; none of which fixed the issue.
Thank you.