I was getting strange results when trying to learn some simple MLP and after stripping the code from everything but what was essential and shrinking it I'm still getting results that are strange.
Code:
import numpy as np
import theano
import theano.tensor as T
import lasagne
dtype = np.float32
states = np.eye(3, dtype=dtype).reshape(3,1,1,3)
values = np.array([[147, 148, 135,147], [147,147,149,148], [148,147,147,147]], dtype=dtype)
output_dim = values.shape[1]
hidden_units = 50
#Network setup
inputs = T.tensor4('inputs')
targets = T.matrix('targets')
network = lasagne.layers.InputLayer(shape=(None, 1, 1, 3), input_var=inputs)
network = lasagne.layers.DenseLayer(network, 50, nonlinearity=lasagne.nonlinearities.rectify)
network = lasagne.layers.DenseLayer(network, output_dim)
prediction = lasagne.layers.get_output(network)
loss = lasagne.objectives.squared_error(prediction, targets).mean()
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.sgd(loss, params, learning_rate=0.01)
f_learn = theano.function([inputs, targets], loss, updates=updates)
f_test = theano.function([inputs], prediction)
#Training
it = 5000
for i in range(it):
l = f_learn(states, values)
print "Loss: " + str(l)
print "Expected:"
print values
print "Learned:"
print f_test(states)
print "Last layer weights:"
print lasagne.layers.get_all_param_values(network)[-1]
I would expect the network to learn values given in the 'values' variable and often it does, but equally often it leaves some output nodes with zeros and a huge loss.
Sample output:
Loss: 5426.83349609
Expected:
[[ 147. 148. 135. 147.]
[ 147. 147. 149. 148.]
[ 148. 147. 147. 147.]]
Learned:
[[ 146.99993896 0. 134.99993896 146.99993896]
[ 146.99993896 0. 148.99993896 147.99993896]
[ 147.99995422 0. 146.99996948 146.99993896]]
Last layer weights:
[ 11.40957355 0. 11.36747837 10.98625183]
Why is this happening?