Just for context, I'm trying to implement a gradient descent algorithm with Tensorflow.
I have a matrix X
[ x1 x2 x3 x4 ]
[ x5 x6 x7 x8 ]
which I multiply by some feature vector Y
to get Z
[ y1 ]
Z = X [ y2 ] = [ z1 ]
[ y3 ] [ z2 ]
[ y4 ]
I then put Z through a softmax function, and take the log. I'll refer to the output matrix as W.
All this is implemented as follows (little bit of boilerplate added so it's runnable)
sess = tf.Session()
num_features = 4
num_actions = 2
policy_matrix = tf.get_variable("params", (num_actions, num_features))
state_ph = tf.placeholder("float", (num_features, 1))
action_linear = tf.matmul(params, state_ph)
action_probs = tf.nn.softmax(action_linear, axis=0)
action_problogs = tf.log(action_probs)
W (corresponding to action_problogs
) looks like
[ w1 ]
[ w2 ]
I'd like to find the gradient of w1
with respect to the matrix X
- that is, I'd like to calculate
[ d/dx1 w1 ]
d/dX w1 = .
.
[ d/dx8 w1 ]
(preferably still looking like a matrix so I can add it to X
, but I'm really not concerned about that)
I was hoping that tf.gradients
would do the trick. I calculated the "gradient" like so
problog_gradient = tf.gradients(action_problogs, policy_matrix)
However, when I inspect problog_gradient
, here's what I get
[<tf.Tensor 'foo_4/gradients/foo_4/MatMul_grad/MatMul:0' shape=(2, 4) dtype=float32>]
Note that this has exactly the same shape as X
, but that it really shouldn't. I was hoping to get a list of two gradients, each with respect to 8 elements. I suspect that I'm instead getting two gradients, but each with respect to four elements.
I'm very new to tensorflow, so I'd appreciate and explanation of what's going on and how I might achieve the behavior I desire.