I am working on translating a lasagne neural network into deeplearning4j code. So far I've managed to get the layers in place but I am not sure if the other configurations are okay. I am not an expert in neural networks and cannot easily find the equivalent functions/methods in deeplearning4j.
This is the lasagne python code:
conv_net = NeuralNet(
layers=[
('input', layers.InputLayer),
('conv1a', layers.Conv2DLayer),
('conv1', layers.Conv2DLayer),
('pool1', layers.MaxPool2DLayer),
('dropout1', layers.DropoutLayer),
('conv2a', layers.Conv2DLayer),
('conv2', layers.Conv2DLayer),
('pool2', layers.MaxPool2DLayer),
('dropout2', layers.DropoutLayer),
('conv3a', layers.Conv2DLayer),
('conv3', layers.Conv2DLayer),
('pool3', layers.MaxPool2DLayer),
('dropout3', layers.DropoutLayer),
('hidden4', layers.DenseLayer),
('dropout4', layers.DropoutLayer),
('hidden5', layers.DenseLayer),
('output', layers.DenseLayer),
],
input_shape=(None, NUM_CHANNELS, IMAGE_SIZE, IMAGE_SIZE),
conv1a_num_filters=16, conv1a_filter_size=(7, 7), conv1a_nonlinearity=leaky_rectify,
conv1_num_filters=32, conv1_filter_size=(5, 5), conv1_nonlinearity=leaky_rectify, pool1_pool_size=(2, 2), dropout1_p=0.1,
conv2a_num_filters=64, conv2a_filter_size=(5, 5), conv2a_nonlinearity=leaky_rectify,
conv2_num_filters=64, conv2_filter_size=(3, 3), conv2_nonlinearity=leaky_rectify, pool2_pool_size=(2, 2), dropout2_p=0.2,
conv3a_num_filters=256, conv3a_filter_size=(3, 3), conv3a_nonlinearity=leaky_rectify,
conv3_num_filters=256, conv3_filter_size=(3, 3), conv3_nonlinearity=leaky_rectify, pool3_pool_size=(2, 2), dropout3_p=0.2,
hidden4_num_units=1250, dropout4_p=0.75, hidden5_num_units=1000,
output_num_units=y.shape[1], output_nonlinearity=None,
batch_iterator_train=AugmentBatchIterator(batch_size=180),
update_learning_rate=theano.shared(np.cast['float32'](0.03)),
update_momentum=theano.shared(np.cast['float32'](0.9)),
on_epoch_finished=[
AdjustVariable('update_learning_rate', start=0.01, stop=0.0001),
AdjustVariable('update_momentum', start=0.9, stop=0.999),
StoreBestModel('wb_' + out_file_name)
],
regression=True,
max_epochs=600,
train_split=0.1,
verbose=1,
)
conv_net.batch_iterator_train.part_flips = flip_idxs
conv_net.load_params_from('wb_keypoint_net3.pk')
conv_net.fit(X, y)
And here is what I've got so far in deeplearning4j:
int batch = 100;
int iterations = data.getX().size(0) / batch + 1;
int epochs = 600;
logger.warn("Building model");
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.updater(Updater.NESTEROVS).momentum(0.9)
.activation(Activation.RELU)
.weightInit(WeightInit.XAVIER)
.learningRate(0.3)
.learningRateDecayPolicy(LearningRatePolicy.Score)
.lrPolicyDecayRate(0.1)
.regularization(true).l2(1e-4)
.list()
.layer(0, new ConvolutionLayer.Builder(7, 7).activation(Activation.LEAKYRELU).nOut(16).build()) //rectified linear units
.layer(1, new ConvolutionLayer.Builder(5, 5).nOut(32).activation(Activation.LEAKYRELU).build())
.layer(2, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX).kernelSize(2, 2).build())
.layer(3, new DropoutLayer.Builder(0.1).build())
.layer(4, new ConvolutionLayer.Builder(5, 5).nOut(64).activation(Activation.LEAKYRELU).build())
.layer(5, new ConvolutionLayer.Builder(3, 3).nOut(64).activation(Activation.LEAKYRELU).build())
.layer(6, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX).kernelSize(2, 2).build())
.layer(7, new DropoutLayer.Builder(0.2).build())
.layer(8, new ConvolutionLayer.Builder(3, 3).nOut(256).activation(Activation.LEAKYRELU).build())
.layer(9, new ConvolutionLayer.Builder(3, 3).nOut(256).activation(Activation.LEAKYRELU).build())
.layer(10, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX).kernelSize(2, 2).build())
.layer(11, new DropoutLayer.Builder(0.2).build())
.layer(12, new DenseLayer.Builder().nOut(1250).build())
.layer(13, new DropoutLayer.Builder(0.75).build())
.layer(14, new DenseLayer.Builder().nOut(1000).build())
.layer(15, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
.nOut(data.getY().size(1)).activation(Activation.SOFTMAX).build())
.setInputType(InputType.convolutional(image_size, image_size, num_channels))
.backprop(true).pretrain(false)
.build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
DataSet dataSet = new DataSet(data.getX(), data.getY());
MiniBatchFileDataSetIterator iterator1 = new MiniBatchFileDataSetIterator(dataSet, batch);
model.init();
logger.warn("Train model");
model.setListeners(new ScoreIterationListener(iterations));
UtilSaveLoadMultiLayerNetwork uslmln = new UtilSaveLoadMultiLayerNetwork();
for (int i = 0; i < epochs; i++) {
logger.warn("Started epoch " + i);
model.fit(iterator1);
uslmln.save(model, filename);
}
I am mainly interested if the activation function and the configurations are equivalent. The problem is that when I run the neural network in java it seems to not learn at all, the score seems to stay at 0.2 even after 50 epochs with no visible improvements and I am sure that something was misconfigured.
Thanks