1

I was getting strange results when trying to learn some simple MLP and after stripping the code from everything but what was essential and shrinking it I'm still getting results that are strange.

Code:

import numpy as np
import theano
import theano.tensor as T
import lasagne


dtype = np.float32
states = np.eye(3, dtype=dtype).reshape(3,1,1,3)
values = np.array([[147, 148, 135,147], [147,147,149,148], [148,147,147,147]], dtype=dtype)
output_dim = values.shape[1]
hidden_units = 50

#Network setup
inputs = T.tensor4('inputs')
targets = T.matrix('targets')

network = lasagne.layers.InputLayer(shape=(None, 1, 1, 3), input_var=inputs)
network = lasagne.layers.DenseLayer(network, 50, nonlinearity=lasagne.nonlinearities.rectify)
network = lasagne.layers.DenseLayer(network, output_dim)

prediction = lasagne.layers.get_output(network)
loss = lasagne.objectives.squared_error(prediction, targets).mean()
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.sgd(loss, params, learning_rate=0.01)

f_learn = theano.function([inputs, targets],  loss, updates=updates)
f_test = theano.function([inputs], prediction)


#Training
it = 5000
for i in range(it):
    l = f_learn(states, values)
    print "Loss: " + str(l)
    print "Expected:"
    print values
    print "Learned:"
    print f_test(states)
    print "Last layer weights:"
    print lasagne.layers.get_all_param_values(network)[-1]

I would expect the network to learn values given in the 'values' variable and often it does, but equally often it leaves some output nodes with zeros and a huge loss.

Sample output:

Loss: 5426.83349609
Expected:
[[ 147.  148.  135.  147.]
 [ 147.  147.  149.  148.]
 [ 148.  147.  147.  147.]]
Learned:
[[ 146.99993896    0. 134.99993896  146.99993896]
 [ 146.99993896    0. 148.99993896  147.99993896]
 [ 147.99995422    0. 146.99996948  146.99993896]]
Last layer weights:
[ 11.40957355   0. 11.36747837  10.98625183]

Why is this happening?

o-90
  • 17,045
  • 10
  • 39
  • 63
mihahauke
  • 11
  • 2

1 Answers1

0

I have asked the same question in lasagne google group and I was more fortunate there: https://groups.google.com/forum/#!topic/lasagne-users/ock-2RqTaFk Changing recitfier units to nonlinearities that tolerate negative outputs helped.

mihahauke
  • 11
  • 2