Due to the project work of my master study I am implementing a neural network using the tensorflow library form Google. At that I would like to determine (at the output layer of my feed forward neural network) several labels in parallel. And as activation function of the output layer I want to use the softmax function. So what I want to have specifically is a output is a Vector that looks like this:
vec = [0.1, 0.8, 0.1, 0.3, 0.2, 0.5]
Here the first three numbers are the probabilities of the three classes of the first classification and the other three numbers are the probabilities of the three classes of the second classification. So in this case I would say that the labels are:
[ class2 , class3 ]
In a first attempt I tried to implement this by first reshapeing the (1x6) vector to a (2x3) Matrix with tf.reshape(), then apply the softmax-function on the matrix tf.nn.softmax() and finally reshape the matrix back to a vector. Unfortunately, due to the reshaping, the Gradient-Descent-Optimizer gets problems with calculating the gradient, so I tried something different.
What I do now is, I take the (1x6) vector and multiply it my a matrix that has a (3x3) identity-matrix in the upper part and a (3x3) zero-matrix in the lower part. Whit this I extract the first three entries of the vector. Then I can apply the softmax function and bring it back into the old form of (1x6) by another matrix multiplication. This has to be repeated for the other three vector entries as well.
outputSoftmax = tf.nn.softmax( vec * [[1,0,0],[0,1,0],[0,0,1],[0,0,0],[0,0,0],[0,0,0]] ) * tf.transpose( [[1,0,0],[0,1,0],[0,0,1],[0,0,0],[0,0,0],[0,0,0]] )
+ tf.nn.softmax( vec * [[0,0,0],[0,0,0],[0,0,0],[1,0,0],[0,1,0],[0,0,1]] ) * tf.transpose( [[0,0,0],[0,0,0],[0,0,0],[1,0,0],[0,1,0],[0,0,1]] )
It works so far, but I don't like this solution. Because in my real problem, I not only have to determine two labels at a time but 91, I would have to repeat the procedure form above 91-times.
Does anyone have an solution, how I can obtain the desired vector, where the softmax function is applied on only three entries at a time, without writing the "same" code 91-times?