0

I am working on translating a lasagne neural network into deeplearning4j code. So far I've managed to get the layers in place but I am not sure if the other configurations are okay. I am not an expert in neural networks and cannot easily find the equivalent functions/methods in deeplearning4j.

This is the lasagne python code:

    conv_net = NeuralNet(
    layers=[
        ('input', layers.InputLayer),
        ('conv1a', layers.Conv2DLayer),
        ('conv1', layers.Conv2DLayer),
        ('pool1', layers.MaxPool2DLayer),
        ('dropout1', layers.DropoutLayer),
        ('conv2a', layers.Conv2DLayer),
        ('conv2', layers.Conv2DLayer),
        ('pool2', layers.MaxPool2DLayer),
        ('dropout2', layers.DropoutLayer),
        ('conv3a', layers.Conv2DLayer),
        ('conv3', layers.Conv2DLayer),
        ('pool3', layers.MaxPool2DLayer),
        ('dropout3', layers.DropoutLayer),
        ('hidden4', layers.DenseLayer),
        ('dropout4', layers.DropoutLayer),
        ('hidden5', layers.DenseLayer),
        ('output', layers.DenseLayer),
    ],

    input_shape=(None, NUM_CHANNELS, IMAGE_SIZE, IMAGE_SIZE),
    conv1a_num_filters=16, conv1a_filter_size=(7, 7), conv1a_nonlinearity=leaky_rectify,
    conv1_num_filters=32, conv1_filter_size=(5, 5), conv1_nonlinearity=leaky_rectify, pool1_pool_size=(2, 2), dropout1_p=0.1,
    conv2a_num_filters=64, conv2a_filter_size=(5, 5), conv2a_nonlinearity=leaky_rectify,
    conv2_num_filters=64, conv2_filter_size=(3, 3), conv2_nonlinearity=leaky_rectify, pool2_pool_size=(2, 2), dropout2_p=0.2,
    conv3a_num_filters=256, conv3a_filter_size=(3, 3), conv3a_nonlinearity=leaky_rectify,
    conv3_num_filters=256, conv3_filter_size=(3, 3), conv3_nonlinearity=leaky_rectify, pool3_pool_size=(2, 2), dropout3_p=0.2,
    hidden4_num_units=1250, dropout4_p=0.75, hidden5_num_units=1000,
    output_num_units=y.shape[1], output_nonlinearity=None,

    batch_iterator_train=AugmentBatchIterator(batch_size=180),

    update_learning_rate=theano.shared(np.cast['float32'](0.03)),
    update_momentum=theano.shared(np.cast['float32'](0.9)),

    on_epoch_finished=[
        AdjustVariable('update_learning_rate', start=0.01, stop=0.0001),
        AdjustVariable('update_momentum', start=0.9, stop=0.999),
        StoreBestModel('wb_' + out_file_name)
    ],

    regression=True,
    max_epochs=600,
    train_split=0.1,
    verbose=1,
)

conv_net.batch_iterator_train.part_flips = flip_idxs
conv_net.load_params_from('wb_keypoint_net3.pk')

conv_net.fit(X, y)

And here is what I've got so far in deeplearning4j:

  int batch = 100;
    int iterations = data.getX().size(0) / batch + 1;
    int epochs = 600;
    logger.warn("Building model");
    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .updater(Updater.NESTEROVS).momentum(0.9)
            .activation(Activation.RELU)
            .weightInit(WeightInit.XAVIER)
            .learningRate(0.3)
            .learningRateDecayPolicy(LearningRatePolicy.Score)
            .lrPolicyDecayRate(0.1)
            .regularization(true).l2(1e-4)
            .list()
            .layer(0, new ConvolutionLayer.Builder(7, 7).activation(Activation.LEAKYRELU).nOut(16).build()) //rectified linear units
            .layer(1, new ConvolutionLayer.Builder(5, 5).nOut(32).activation(Activation.LEAKYRELU).build())
            .layer(2, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX).kernelSize(2, 2).build())
            .layer(3, new DropoutLayer.Builder(0.1).build())
            .layer(4, new ConvolutionLayer.Builder(5, 5).nOut(64).activation(Activation.LEAKYRELU).build())
            .layer(5, new ConvolutionLayer.Builder(3, 3).nOut(64).activation(Activation.LEAKYRELU).build())
            .layer(6, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX).kernelSize(2, 2).build())
            .layer(7, new DropoutLayer.Builder(0.2).build())
            .layer(8, new ConvolutionLayer.Builder(3, 3).nOut(256).activation(Activation.LEAKYRELU).build())
            .layer(9, new ConvolutionLayer.Builder(3, 3).nOut(256).activation(Activation.LEAKYRELU).build())
            .layer(10, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX).kernelSize(2, 2).build())
            .layer(11, new DropoutLayer.Builder(0.2).build())
            .layer(12, new DenseLayer.Builder().nOut(1250).build())
            .layer(13, new DropoutLayer.Builder(0.75).build())
            .layer(14, new DenseLayer.Builder().nOut(1000).build())
            .layer(15, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                    .nOut(data.getY().size(1)).activation(Activation.SOFTMAX).build())
            .setInputType(InputType.convolutional(image_size, image_size, num_channels))
            .backprop(true).pretrain(false)
            .build();

    MultiLayerNetwork model = new MultiLayerNetwork(conf);
    DataSet dataSet = new DataSet(data.getX(), data.getY());


    MiniBatchFileDataSetIterator iterator1 = new MiniBatchFileDataSetIterator(dataSet, batch);


    model.init();
    logger.warn("Train model");

    model.setListeners(new ScoreIterationListener(iterations));
    UtilSaveLoadMultiLayerNetwork uslmln = new UtilSaveLoadMultiLayerNetwork();
    for (int i = 0; i < epochs; i++) {
        logger.warn("Started epoch " + i);
        model.fit(iterator1);
        uslmln.save(model, filename);
     }

I am mainly interested if the activation function and the configurations are equivalent. The problem is that when I run the neural network in java it seems to not learn at all, the score seems to stay at 0.2 even after 50 epochs with no visible improvements and I am sure that something was misconfigured.

Thanks

1 Answers1

0

Is your data pipeline the exact same? This includes normalization and the like as well. With deeplearning4j you don't need to specify the number of outputs. We do that for you. Also - you are using the UI server wrong. Our examples demonstrate how to do these things already: https://github.com/deeplearning4j/dl4j-examples

I'm not sure what lead you to do this but you are reattaching storage every time so the neural net's actual statistics don't persist over time. You should set that above your for loop not during. You may want to look in to using early stopping if you want model snapshotting like what you're trying to do here.

Also - how did you get to that bias learning rate? That's not even showing up in your lasange config. That looks arbitrary. I would advise getting rid of that. Your output layer also looks wrong. You should be using negative log likelihood and softmax (again look at our examples it's all in there) From the looks of it, you're also using learning rate decay in lasange. Deeplearning4j supports that as well. I would look through our examples for how to do that. We support several learning rate decay policies. You should be able to find that in the javadoc (http://deeplearning4j.org/doc) or in your ide's auto complete.

Adam Gibson
  • 3,055
  • 1
  • 10
  • 12
  • Thank you for your help! Yes, the pipeline is exact the same. I have removed the ui part, and I am using negative log likelihood and softmax for the output layer. But now the score for the first iteration is negative, and the second one is NaN. I am still doing something wrong, and I don't know what. INFO: Score at iteration 0 is -13.107518335451656 INFO: Score at iteration 21 is NaN. And if I am removing the outputs I am getting some errors... – user8168172 Jun 16 '17 at 20:26
  • Is the data already normalized? Follow this and see how far you get: http://deeplearning4j.org/troubleshootingneuralnets – Adam Gibson Jun 17 '17 at 10:28
  • Yes it is normalized. I am training the network with 2000 images. X from the dataset is a ndarray: 2000x128x128x3, Y is a ndarray: 2000x16 X has values like: 0.24, 0.17, 0.12, etc and Y has values like -0.55, -0.40, -0.62 etc. Are the values from X too small? – user8168172 Jun 17 '17 at 12:08
  • No that's a good sign. Your learning rate isn't the same as your python script. The other thing I would look at is the learning rate decay. – Adam Gibson Jun 17 '17 at 12:36
  • I have updated the learning rate (from 0.1 to 0.3) and I have tried with all the learningRateDecayPolicy types. And it is the same result: first iteration is negative and the others are NaN. I don't know what I am doing wrong. – user8168172 Jun 17 '17 at 14:10
  • Your learning rate in python is 0.03 not 0.3 o_0. You shouldn't be increasing the learning rate here. – Adam Gibson Jun 17 '17 at 14:14
  • Also - what do your labels look like? They should be 1 hot (0 1 0) or similar, otherwise your neural net will diverge. It appears you might be doing regression looking further at your python config? I have to guess it's your labels. If you aren't doing classification you need identity on the output and mse on the loss function. I assumed you were doing classification. (I didn't look at your other parts). Sorry I don't know lasange that well. You should consider normalizing your labels if they aren't already. See: https://github.com/deeplearning4j/dl4j-examples – Adam Gibson Jun 17 '17 at 14:31
  • The dataset labels have values like this: 0.11, -0.24, 0.31,-0.27 etc. Yes I am doing classification. I am trying to crop images based on the position of some pixels. – user8168172 Jun 17 '17 at 15:33
  • What you're doing isn't classification. Classification is ONLY probabilities. I would retool your problem. – Adam Gibson Jun 18 '17 at 00:38
  • what retool means? To change the library (DL4j)? or to change the neural network configuration? – user8168172 Jun 18 '17 at 08:14
  • No I mean change the outputs. You should be outputting probabilites for you labels not whatever weird numbers you're doing here (positions?) – Adam Gibson Jun 18 '17 at 08:16