Lasagne: L2 regularization on hidden layer returns NaN from loss function

Question

I'm trying to put together a really simple 3-layer neural network in lasagne. 30 input neurons, 10-neuron hidden layer, 1-neuron output layer. I'm using the binary_crossentropy loss function and sigmoid nonlinearity. I want to put l1 regularization on the edges entering the output layer and l2 regularization on the edges from the input to the hidden layer. I'm using code very close to the example code on the regularization page of the lasagne documentation.

The L1 regularization seems to work fine, but whenever I add the L2 regularization's penalty term to the loss function, it returns nan. Everything works fine when I remove the term l2_penalty * l2_reg_param from the last line below. Additionally, I'm able to perform L1 regularization on the hidden layer l_hid1 without any issues.

This is my first foray into theano and lasagne so I feel like the error is probably something pretty simple but I just don't know enough to see it.

Here's the net setup code:

l_in = lasagne.layers.InputLayer(shape=(942,1,1,30),input_var=input_var)

l_hid1 = lasagne.layers.DenseLayer(l_in, num_units=10, nonlinearity=lasagne.nonlinearities.sigmoid, W=lasagne.init.GlorotUniform())

network = lasagne.layers.DenseLayer(l_hid1, num_units=1, nonlinearity=lasagne.nonlinearities.sigmoid)

prediction = lasagne.layers.get_output(network)

l2_penalty = regularize_layer_params(l_hid1, l2)
l1_penalty = regularize_layer_params(network, l1)

loss = lasagne.objectives.binary_crossentropy(prediction, target_var)
loss = loss.mean()
loss = loss + l1_penalty * l1_reg_param + l2_penalty * l2_reg_param

Any help would be greatly appreciated. Thanks!!

It's really just 942 samples with 30 features each. I thought from the examples that lasagne always wanted its data as (num_samples x channels x rows x columns). Do you think the strange dimensions here could be throwing off the l2 regularization? Weird that everything else seems to work OK though, including the l1 regularizer. — Matthew, Jan 12 '17 at 15:21
The lasagne examples you're looking at are probably ones involving classifying images, which are represented as (channel, height, width). The first dimension is batch size. So a typical input would be `(32, 3, 128, 128)`. This would be 32 128x128 color images. If your data is just 30 features, then the input can just be (batch_size, 30). Without access to your data though it is hard to say whether or not this is the cause of your issue. — o-90, Jan 12 '17 at 16:16
They are image classification examples. I haven't found many examples that aren't, actually. I've dug a little deeper into this and found that the function `lasagne.regularization.l2` runs fine on a 30x10 numpy array, but when I give that function the 30x10 weight matrix of the hidden layer it errors out: `'/usr/local/lib/python2.7/dist-packages/lasagne/regularization.pyc in l2(x) 80 squared l2 norm (sum of squared values of elements) 81 """ --> 82 return T.sum(x**2) 83 TypeError: unsupported operand type(s) for ** or pow(): 'CudaNdarray' and 'int''` — Matthew, Jan 12 '17 at 17:14
Ignore my previous comment, I was giving it its arguments wrong when I generated that error message. I'm back to having no idea why this doesn't work. Part of me is tempted to just switch over to dropout for regularization and be done with it, but another part is too stubborn. — Matthew, Jan 12 '17 at 17:21

Lasagne: L2 regularization on hidden layer returns NaN from loss function

0 Answers0