MultiLayerNetwork to predict simple function

Question

I'm trying to develop some intuition for machine learning. I looked over examples from https://github.com/deeplearning4j/dl4j-0.4-examples and I wanted to develop my own example. Basically I just took a simple function: a * a + b * b + c * c - a * b * c + a + b + c and generated 10000 outputs for random a,b,c and tried to train my network on 90% of the inputs. The thing is no matter what I done my network never gets to predict the rest of the examples.

Here is my code:

public class BasicFunctionNN {

    private static Logger log = LoggerFactory.getLogger(MlPredict.class);

    public static DataSetIterator generateFunctionDataSet() {
        Collection<DataSet> list = new ArrayList<>();
        for (int i = 0; i < 100000; i++) {
            double a = Math.random();
            double b = Math.random();
            double c = Math.random();

            double output = a * a + b * b + c * c - a * b * c + a + b + c;
            INDArray in = Nd4j.create(new double[]{a, b, c});
            INDArray out = Nd4j.create(new double[]{output});
            list.add(new DataSet(in, out));
        }
        return new ListDataSetIterator(list, list.size());
    }

    public static void main(String[] args) throws Exception {
        DataSetIterator iterator = generateFunctionDataSet();

        Nd4j.MAX_SLICES_TO_PRINT = 10;
        Nd4j.MAX_ELEMENTS_PER_SLICE = 10;

        final int numInputs = 3;
        int outputNum = 1;
        int iterations = 100;

        log.info("Build model....");
        MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
                .iterations(iterations).weightInit(WeightInit.XAVIER).updater(Updater.SGD).dropOut(0.5)
                .learningRate(.8).regularization(true)
                .l1(1e-1).l2(2e-4)
                .optimizationAlgo(OptimizationAlgorithm.LINE_GRADIENT_DESCENT)
                .list(3)
                .layer(0, new DenseLayer.Builder().nIn(numInputs).nOut(8)
                        .activation("identity")
                        .build())
                .layer(1, new DenseLayer.Builder().nIn(8).nOut(8)
                        .activation("identity")
                        .build())
                .layer(2, new OutputLayer.Builder(LossFunctions.LossFunction.RMSE_XENT)//LossFunctions.LossFunction.RMSE_XENT)
                        .activation("identity")
                        .weightInit(WeightInit.XAVIER)
                        .nIn(8).nOut(outputNum).build())
                .backprop(true).pretrain(false)
                .build();


        //run the model
        MultiLayerNetwork model = new MultiLayerNetwork(conf);
        model.init();
        model.setListeners(Collections.singletonList((IterationListener) new ScoreIterationListener(iterations)));

        //get the dataset using the record reader. The datasetiterator handles vectorization
        DataSet next = iterator.next();
        SplitTestAndTrain testAndTrain = next.splitTestAndTrain(0.9);
        System.out.println(testAndTrain.getTrain());

        model.fit(testAndTrain.getTrain());

        //evaluate the model
        Evaluation eval = new Evaluation(10);
        DataSet test = testAndTrain.getTest();
        INDArray output = model.output(test.getFeatureMatrix());
        eval.eval(test.getLabels(), output);
        log.info(">>>>>>>>>>>>>>");
        log.info(eval.stats());

    }
}

I also played with the learning rate, and it happens many time that score doesn't improve:

10:48:51.404 [main] DEBUG o.d.o.solvers.BackTrackLineSearch - Exited line search after maxIterations termination condition; score did not improve (bestScore=0.8522868127536543, scoreAtStart=0.8522868127536543). Resetting parameters

As an activation function I also tried relu

lejlot · Accepted Answer · 2016-03-30T19:17:33.423

One obvious problem is that you are trying to model nonlinear function with linear model. Your neural network has no activation functions thus it efficiently can only express functions of the form W1a + W2b + W3c + W4. It does not matter how many hidden units you create - as long as there is no non-linear activation function used, your network degenerates to simple linear model.

update

There are also many "small weird things", including but not limited to:

you are using huge learning rate (0.8)
you are using lots of regularization for (quite complex, using both l1 and l2 regularizers for regression is not a common approach, especially in neural networks) a problem where you need none
rectifier units might not be the best ones to express square operation, as well as multiplication that you are looking for. Rectifiers are very good for classification, especially with deeper architectures, but not for shallow regression. Try sigmoid-alike (tanh, sigmoid) activations instead.
I am not entirely sure what "iteration" means in this implementation, but usually this is amount of samples/minibatches used for training. Thus using just 100 might be orders of magnitude too small for gradient descent learning

MultiLayerNetwork to predict simple function

1 Answers1

update