4

Due to the project work of my master study I am implementing a neural network using the tensorflow library form Google. At that I would like to determine (at the output layer of my feed forward neural network) several labels in parallel. And as activation function of the output layer I want to use the softmax function. So what I want to have specifically is a output is a Vector that looks like this:

vec = [0.1, 0.8, 0.1,   0.3, 0.2, 0.5]

Here the first three numbers are the probabilities of the three classes of the first classification and the other three numbers are the probabilities of the three classes of the second classification. So in this case I would say that the labels are:

[ class2 , class3 ]

In a first attempt I tried to implement this by first reshapeing the (1x6) vector to a (2x3) Matrix with tf.reshape(), then apply the softmax-function on the matrix tf.nn.softmax() and finally reshape the matrix back to a vector. Unfortunately, due to the reshaping, the Gradient-Descent-Optimizer gets problems with calculating the gradient, so I tried something different.

What I do now is, I take the (1x6) vector and multiply it my a matrix that has a (3x3) identity-matrix in the upper part and a (3x3) zero-matrix in the lower part. Whit this I extract the first three entries of the vector. Then I can apply the softmax function and bring it back into the old form of (1x6) by another matrix multiplication. This has to be repeated for the other three vector entries as well.

outputSoftmax  = tf.nn.softmax( vec * [[1,0,0],[0,1,0],[0,0,1],[0,0,0],[0,0,0],[0,0,0]] ) *  tf.transpose( [[1,0,0],[0,1,0],[0,0,1],[0,0,0],[0,0,0],[0,0,0]] )
               + tf.nn.softmax( vec * [[0,0,0],[0,0,0],[0,0,0],[1,0,0],[0,1,0],[0,0,1]] ) *  tf.transpose( [[0,0,0],[0,0,0],[0,0,0],[1,0,0],[0,1,0],[0,0,1]] )

It works so far, but I don't like this solution. Because in my real problem, I not only have to determine two labels at a time but 91, I would have to repeat the procedure form above 91-times.

Does anyone have an solution, how I can obtain the desired vector, where the softmax function is applied on only three entries at a time, without writing the "same" code 91-times?

OmG
  • 18,337
  • 10
  • 57
  • 90

1 Answers1

5

You could apply the tf.split function to obtain 91 tensors (one for each class), then apply softmax to each of them.

classes_split = tf.split(0, 91, all_in_one)

for c in classes_split:
    softmax_class = tf.nn.softmax(c)
    # use softmax_class to compute some loss, add it to overall loss

or instead of computing the loss directly, you could also concatenate them together again:

classes_split = tf.split(0, 91, all_in_one)

# softmax each split individually
classes_split_softmaxed = [tf.nn.softmax(c) for c in classes_split]
# Concatenate again
all_in_one_softmaxed = tf.concat(0, classes_split_softmaxed)
panmari
  • 3,627
  • 3
  • 28
  • 48
  • Thanks a lot for your help. I have now a nice solution :-) I first use the tf.split function as you suggested, then apply tf.nn.softmax() on every part and at the end I merge it back together with the function tf.concat() – Miss Princess Dec 18 '15 at 09:31