0

I'm using the Lasagne package to build a simple 3 layer neural network, and I'm testing it on a very simple dataset (just 4 examples).

X = np.array([[0,0,1],
              [0,1,1],
              [1,0,1],
              [1,1,1]])         

y = np.array([[0, 0],[1, 0],[1, 1],[0, 1]])

However it fails to learn this, and results in the prediction:

pred = theano.function([input_var], [prediction])
np.round(pred(X), 2)
array([[[ 0.5 ,  0.5 ],
        [ 0.98,  0.02],
        [ 0.25,  0.75],
        [ 0.25,  0.75]]])

Full code:

def build_mlp(input_var=None):
    l_in = lasagne.layers.InputLayer(shape=(None, 3), input_var=input_var)
    l_hid1 = lasagne.layers.DenseLayer(
        l_in, num_units=4,
        nonlinearity=lasagne.nonlinearities.rectify,
        W=lasagne.init.GlorotUniform())
    l_hid2 = lasagne.layers.DenseLayer(
        l_hid1, num_units=4,
        nonlinearity=lasagne.nonlinearities.rectify,
        W=lasagne.init.GlorotUniform())
    l_out = lasagne.layers.DenseLayer(
        l_hid2, num_units=2,
        nonlinearity=lasagne.nonlinearities.softmax)
    return l_out

input_var = T.lmatrix('inputs')
target_var = T.lmatrix('targets')

network = build_mlp(input_var)

prediction = lasagne.layers.get_output(network, deterministic=True)
loss = lasagne.objectives.squared_error(prediction, target_var)
loss = loss.mean()

params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(
    loss, params, learning_rate=0.01, momentum=0.9)

train_fn = theano.function([input_var, target_var], loss, updates=updates)
val_fn = theano.function([input_var, target_var], [loss])

Training:

num_epochs = 1000
for epoch in range(num_epochs):
    inputs, targets = (X, y)
    train_fn(inputs, targets)   

I'm guessing there might be an issue with the nonlinear functions used in the hidden layers, or with the learning method.

Ricky Jones
  • 131
  • 3
  • 8

2 Answers2

0

this is my guess for the problem,

First, I don't know why is there output like [0,0]? is that means that sample not categorize in all classes?

Second, You are using Softmax in the last layer, that usually use for classification, are you build this network for classification? if you confuse about the output, the output is actually probability of each class So I think the output is correct:

  • second sample prediction is [0.98 0.02] so it means the second sample is belong to first class, like your target [1 0]

  • third sample prediction is [0.25 0.75] so it means the third sample is belong to second class, like your target [1 1] (regardless your first class value, it is classification, so it'll be count as correct classification by system)

  • fourth sample prediction is [0.25 0.75] so it means the fourth sample is belong to second class, like your target [0 1]

  • first sample prediction is [0.5 0.5] this one seems a bit confusing to me, so I guess Lasagne will predict the first sample which have the same probability in each class as not a member of any classes

malioboro
  • 3,097
  • 4
  • 35
  • 55
0

I feel like you can't really judge whether the model is correctly learning based on the above.

  1. Number of training instances You have 4 training instances. The neural network you constructed contains 3*4 + 4*4 + 4*2 = 36 weights which it has to learn. Not to mention you have 4 different types of outputs. The network is definitely underfitting, which may explain unexpected results.

  2. How to test if a model is working If I wanted to test whether a neural network is correctly learning, I would test on a working dataset (like MNIST) and ensure my model is learning with high probability. You could also try comparing with another neural network library you've already written or with literature. If I really wanted to go micro, I would use boosting with a linearly separable dataset.

If your model still doesn't learn properly, I would be concerned then.

jrhee17
  • 1,152
  • 9
  • 19