DeepLearning4j NN for prediction function doesn't converge

Question

I'm trying to do a simple prediction in DL4j (going to use it later for a large dataset with n features) but no matter what I do my network just doesn't want to learn and behaves very weird. Of course I studied all the tutorials and did the same steps shown in dl4j repo, but it doesn't work for me somehow.

For dummy features data I use:

*double[val][x] features; where val = linspace(-10,10)...; and x= Math.sqrt(Math.abs(val)) * val;

my y is : double[y] labels; where y = Math.sin(val) / val

DataSetIterator dataset_train_iter = getTrainingData(x_features, y_outputs_train, batchSize, rnd);
    DataSetIterator dataset_test_iter = getTrainingData(x_features_test, y_outputs_test, batchSize, rnd);

    // Normalize data, including labels (fitLabel=true)
    NormalizerMinMaxScaler normalizer = new NormalizerMinMaxScaler(0, 1);
    normalizer.fitLabel(false);
    normalizer.fit(dataset_train_iter);              
    normalizer.fit(dataset_test_iter);

    // Use the .transform function only if you are working with a small dataset and no iterator
    normalizer.transform(dataset_train_iter.next());
    normalizer.transform(dataset_test_iter.next());

    dataset_train_iter.setPreProcessor(normalizer);
    dataset_test_iter.setPreProcessor(normalizer);

    //DataSet setNormal = dataset.next();

//Create the network

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
                .seed(seed)
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .weightInit(WeightInit.XAVIER)
                //.miniBatch(true)
                //.l2(1e-4)
                //.activation(Activation.TANH)
                .updater(new Nesterovs(0.1,0.3))
                .list()
                .layer(new DenseLayer.Builder().nIn(numInputs).nOut(20).activation(Activation.TANH)
                        .build())
                .layer(new DenseLayer.Builder().nIn(20).nOut(10).activation(Activation.TANH)
                        .build())
                .layer( new DenseLayer.Builder().nIn(10).nOut(6).activation(Activation.TANH)
                        .build())
                .layer(new OutputLayer.Builder(LossFunctions.LossFunction.MSE)
                        .activation(Activation.IDENTITY)
                        .nIn(6).nOut(1).build())
                .build();

//Train and fit network

final MultiLayerNetwork net = new MultiLayerNetwork(conf);
net.init();
    net.setListeners(new ScoreIterationListener(100));
    //Train the network on the full data set, and evaluate in periodically
    final INDArray[] networkPredictions = new INDArray[nEpochs / plotFrequency];
    for (int i = 0; i < nEpochs; i++) {
        //in fit we have already Backpropagation. See Release deeplearning
        // https://deeplearning4j.konduit.ai/release-notes/1.0.0-beta3
        net.fit(dataset_train_iter);
        dataset_train_iter.reset();
        if((i+1) % plotFrequency == 0)  networkPredictions[i/ plotFrequency] = net.output(x_features, false);
    }

// evaluate and plot

    dataset_test_iter.reset();
    dataset_train_iter.reset();

    INDArray predicted = net.output(dataset_test_iter, false);
    System.out.println("PREDICTED ARRAY                " + predicted);
    INDArray output_train = net.output(dataset_train_iter, false);

    //Revert data back to original values for plotting
    // normalizer.revertLabels(predicted);
    normalizer.revertLabels(output_train);
    normalizer.revertLabels(predicted);

    PlotUtil.plot(om, y_outputs_train, networkPredictions);

My output seems then very weird (see picture below), even when I use miniBatch (1, 20,100 Samples/Batch) change number of epochs or add hidden nodes and hidden Layers (tryed to add 1000 Nodes and 5 Layers). The network either outputs very stochastic values or the one constant y. I just can't recognize, what is going wrong here. Why the network even doesn't approach the train function.

Another question: what doesn iter.reset() do exactly. Does the Iterator turn the pointer back to 0-Batch in the DataSetIterator?

maybe I have to mention, I create x_features and y_labels directly from my double[][] and double[] arrays like this: `INDArray matrix = Nd4j.create(double[][]);` and then create my **DataSetIterator** in this way: `DataSet allData = new DataSet(x, y); final List list = allData.asList(); Collections.shuffle(list, rng); return new ListDataSetIterator<>(list, batchSize);` — BigEl, Jul 30 '21 at 07:22

score 1 · Answer 1 · answered Jul 29 '21 at 13:52

A pretty common problem is people doing toy problems like this is dl4j's assumption of minibatches (which 99% of problems tend to be). You aren't actually doing minibatch learning (which actually defeats the point of actually using an iterator, which is meant to iterate through slices of a dataset, not an in memory small dataset) - a small recommendation is to just use the normal dataset api (which is what's returned from dataset.next())

Ensure you turn off the minibatch penalty dl4j assigns to all losses with: .minibatch(false) - you can see that configuration here: https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/NeuralNetConfiguration.java#L434

A unit test testing this behavior can be found here: https://github.com/eclipse/deeplearning4j/blob/b4047006ac8175df295c2f3c008e7601437ea4dc/deeplearning4j/deeplearning4j-core/src/test/java/org/deeplearning4j/gradientcheck/GradientCheckTests.java#L94

For posterity, here is the relevant configuration:


        MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().miniBatch(false)
                .dataType(DataType.DOUBLE)
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).updater(new NoOp())
                .list()
                .layer(0,
                        new DenseLayer.Builder().nIn(4).nOut(3)
                                .dist(new NormalDistribution(0, 1))
                                .activation(Activation.TANH)
                                .build())
                .layer(1, new OutputLayer.Builder(LossFunction.MCXENT)
                        .activation(Activation.SOFTMAX).nIn(3).nOut(3).build())
                .build();

You'll notice 2 things: 1 is minibatch is false and 2 is the configuration for data type double. You are also welcome to try that for your problem. Dl4j to save memory tends to also assume float for the default data type.

This is a reasonable assumption when working on larger problems, but may not work well for toy problems.

For reference, you can find the application of the minibatch math here: https://github.com/eclipse/deeplearning4j/blob/fc735d30023981ebbb0fafa55ea9520ec44292e0/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/updater/BaseMultiLayerUpdater.java#L332

This affects the gradient updates.

The score penalty can be found in the output layer: https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/layers/BaseOutputLayer.java#L84

Essentially, both of these automatically penalize the loss update for your dataset reflected in both the loss and the gradient updates.

thank you a lot. I have tried your configuration but it doesn't improve my network a lot..It seems like the output is completly random or the weights are not going to be updated. Maybe its up to iterator or ..im completly lost https://drive.google.com/file/d/1c5aqZAOs51YdkH5wrdIgxO90FiOC1urU/view?usp=sharing For this prediction I used: BatchSize: 10 ; Epochs: 100 — BigEl, Jul 29 '21 at 18:43
One other thing I would suggest is to actually normalize your labels or use RMSE. Normalizing your labels can allow the network to learn easier. The normalizers in dl4j all have a revert function for both input features and labels when you want the un normalized output. — Adam Gibson, Jul 31 '21 at 11:50

DeepLearning4j NN for prediction function doesn't converge

1 Answers1