I am trying to make the mnist.py
tutorial into multi-label classification using CNN. I am very new on this field and my goal is to understand the architecture, the input-output data format in order to be more familiar and run my own multi-label classification problems.
From what I have read until now the modifications I have done are:
Change the training and test labels to 2d numpy arrays where the first dimension (rows) corresponds to instances and the second dimension (columns) to the specific label(s) corresponding to each instance. For mnist example if I want to convert it to a 10 multi-label classification the y_train & y_test should be like
y = np.array([[0, 1, 1, 0, 1, 0, 0, 0, 0, 0], [0, 1, 1, 0, 1, 0, 0, 0, 1, 1],....[1, 1, 1, 0, 1, 0, 0, 0, 0, 1]]);
On the network code we change the non-linearity in the final layer from a softmax to a sigmoid.
network = lasagne.layers.DenseLayer( lasagne.layers.dropout(network, p=.5), num_units=10, nonlinearity=lasagne.nonlinearities.sigmoid)
The
num_units
remain 10 right? Again it is a 10-class classification problem or not?On theano variables we change the
target_var = T.imatrix('targets')
fromT.ivector
. Then according to this post I change the loss functions from categorical cross-entropy to binary cross-entropyloss = lasagne.objectives.binary_crossentropy(prediction, target_var)
test_loss = lasagne.objectives.binary_crossentropy(test_prediction, target_var)
I also change the test_acc variable because I was getting error with mismatch input size. From the previous post this solution seems to work
binaryPrediction = test_prediction > .5 test_acc = T.mean(T.eq(binaryPrediction, target_var))
Well, the I get no errors now but I am not pretty sure that the code is correct. When you are doing muli-label classifaction you have to define other metrics than the typical accuracy, right? Any suggestions??