4

I read all posts in the net adressing the issue where people forgot to change the target vector to a matrix, and as a problem remains after this change, I decided to ask my question here. Workarounds are mentioned below, but new problems show and I am thankful for suggestions!

Using a convolution network setup and binary crossentropy with sigmoid activation function, I get a dimension mismatch problem, but not during the training data, only during validation / test data evaluation. For some strange reason, of of my validation set vectors get his dimension switched and I have no idea, why. Training, as mentioned above, works fine. Code follows below, thanks a lot for help (and sorry for hijacking the thread, but I saw no reason for creating a new one), most of it copied from the lasagne tutorial example.

Workarounds and new problems:

  1. Removing "axis=1" in the valAcc definition helps, but validation accuracy remains zero and test classification always returns the same result, no matter how many nodes, layers, filters etc. I have. Even changing training set size (I have around 350 samples for each class with 48x64 grayscale images) does not change this. So something seems off

Network creation:

def build_cnn(imgSet, input_var=None):
# As a third model, we'll create a CNN of two convolution + pooling stages
# and a fully-connected hidden layer in front of the output layer.

# Input layer using shape information from training
network = lasagne.layers.InputLayer(shape=(None, \
    imgSet.shape[1], imgSet.shape[2], imgSet.shape[3]), input_var=input_var)
# This time we do not apply input dropout, as it tends to work less well
# for convolutional layers.

# Convolutional layer with 32 kernels of size 5x5. Strided and padded
# convolutions are supported as well; see the docstring.
network = lasagne.layers.Conv2DLayer(
        network, num_filters=32, filter_size=(5, 5),
        nonlinearity=lasagne.nonlinearities.rectify,
        W=lasagne.init.GlorotUniform())

# Max-pooling layer of factor 2 in both dimensions:
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))

# Another convolution with 16 5x5 kernels, and another 2x2 pooling:
network = lasagne.layers.Conv2DLayer(
        network, num_filters=16, filter_size=(5, 5),
        nonlinearity=lasagne.nonlinearities.rectify)

network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))

# A fully-connected layer of 64 units with 25% dropout on its inputs:
network = lasagne.layers.DenseLayer(
        lasagne.layers.dropout(network, p=.25),
        num_units=64,
        nonlinearity=lasagne.nonlinearities.rectify)

# And, finally, the 2-unit output layer with 50% dropout on its inputs:
network = lasagne.layers.DenseLayer(
        lasagne.layers.dropout(network, p=.5),
        num_units=1,
        nonlinearity=lasagne.nonlinearities.sigmoid)

return network

Target matrices for all sets are created like this (training target vector as an example)

 targetsTrain = np.vstack( (targetsTrain, [[targetClass], ]*numTr) );

...and the theano variables as such

inputVar = T.tensor4('inputs')
targetVar = T.imatrix('targets')
network = build_cnn(trainset, inputVar)
predictions = lasagne.layers.get_output(network)
loss = lasagne.objectives.binary_crossentropy(predictions, targetVar)
loss = loss.mean()
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.9)
valPrediction = lasagne.layers.get_output(network, deterministic=True)
valLoss = lasagne.objectives.binary_crossentropy(valPrediction, targetVar)
valLoss = valLoss.mean()
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar), dtype=theano.config.floatX)
train_fn = function([inputVar, targetVar], loss, updates=updates,  allow_input_downcast=True)
val_fn = function([inputVar, targetVar], [valLoss, valAcc])

Finally, here the two loops, training and test. The first is fine, the second throws the error, excerpts below

# -- Neural network training itself -- #
numIts = 100
for itNr in range(0, numIts):
train_err = 0
train_batches = 0
for batch in iterate_minibatches(trainset.astype('float32'), targetsTrain.astype('int8'), len(trainset)//4, shuffle=True):
    inputs, targets = batch
    print (inputs.shape)
    print(targets.shape)        
    train_err += train_fn(inputs, targets)
    train_batches += 1

# And a full pass over the validation data:
val_err = 0
val_acc = 0
val_batches = 0

for batch in iterate_minibatches(valset.astype('float32'), targetsVal.astype('int8'), len(valset)//3, shuffle=False):
    [inputs, targets] = batch
    [err, acc] = val_fn(inputs, targets)
    val_err += err
    val_acc += acc
    val_batches += 1

Erorr (excerpts)

Exception "unhandled ValueError"
Input dimension mis-match. (input[0].shape[1] = 52, input[1].shape[1] = 1)
Apply node that caused the error: Elemwise{eq,no_inplace}(DimShuffle{x,0}.0, targets)
Toposort index: 36
Inputs types: [TensorType(int64, row), TensorType(int32, matrix)]
Inputs shapes: [(1, 52), (52, 1)]
Inputs strides: [(416, 8), (4, 4)]
Inputs values: ['not shown', 'not shown']

Again, thanks for help!

gilgamash
  • 862
  • 10
  • 31

1 Answers1

3

so it seems the error is in the evaluation of the validation accuracy. When you remove the "axis=1" in your calculation, the argmax goes on everything, returning only a number. Then, broadcasting steps in and this is why you would see the same value for the whole set.

But from the error you have posted, the "T.eq" op throws the error because it has to compare a 52 x 1 with a 1 x 52 vector (matrix for theano/numpy). So, I suggest you try to replace the line with:

    valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))

I hope this should fix the error, but I haven't tested it myself.

EDIT: The error lies in the argmax op that is called. Normally, the argmax is there to determine which of the output units is activated the most. However, in your setting you only have one output neuron which means that the argmax over all output neurons will always return 0 (for first arg).

This is why you have the impression your network gives you always 0 as output.

By replacing:

    valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))

with:

    binaryPrediction = valPrediction > .5
    valAcc = T.mean(T.eq(binaryPrediction, targetVar.T)

you should get the desired result.

I'm just not sure, if the transpose is still necessary or not.

romeasy
  • 260
  • 1
  • 3
  • 12
  • As mentioned above, the dimension mismatch occurs when axis=1 is there. As soon as I remove it, the error vanishes but the training does not seem to work. I also tried to flatten predictions, which leads to another dimension error. – gilgamash Feb 12 '16 at 08:24
  • Ok, using the transpose works, but now in each step validation accuracy remains unchanged from first to last iteration... – gilgamash Feb 12 '16 at 08:26
  • Can you post the dimensions of the network's output? Your input seems to be a tensor4 with shape _batchsize_ x _stacksize_ x _row_ x _col_ or not? – romeasy Feb 12 '16 at 08:31
  • Yes, input is as guessed, target size is (batchsize, stacksize), where in both cases stacksize = 1. Predictions are Shape.0 – gilgamash Feb 12 '16 at 08:33
  • No, they are not. Shape.0 is only the output of the theano var, you can try to output the shape by temporarily replacing the valAcc with just valPrediction in the function declaration. That is: val_fn = function([inputVar, valPrediction], [valLoss, valAcc]) This way, theano outputs the validation predictions to you and you can then call "acc.shape" within the batch for loop. – romeasy Feb 12 '16 at 08:43
  • Alternatively, you can call lasagne's "get_output_shape" function instead of the workaround that I just posted – romeasy Feb 12 '16 at 08:49
  • I replaced valAcc (not targetVar) with valPrediction and printed the shape of acc: (52, 1), where 52 is the fraction of the validation set I am using for cross validation, so this seems ok to me. Thanks a lot for the insight btw! – gilgamash Feb 12 '16 at 08:53
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/103264/discussion-between-gilgamash-and-romeasy). – gilgamash Feb 12 '16 at 08:55
  • Thanks again, romeasy, for an absolutely smashing chat help! Upvoted! – gilgamash Feb 12 '16 at 09:54
  • One mor thing: As valAcc evaluates T.eq, the prediction vector must be binarized beforehand, such as valPrediction = valPrediction > 0.5, as romeasy states! – gilgamash Feb 15 '16 at 06:29