https://www.tensorflow.org/versions/r1.6/api_docs/python/tf/gradients
In the documentation for tf.gradients(ys, xs) it states that
Constructs symbolic derivatives of sum of ys w.r.t. x in xs
I am confused about the summing part, I have read elsewhere that this sums the derivatives dy/dx across the batch for every x in the batch. However, whenever I use this I fail to see this happening. Take the following simple example:
x_dims = 3
batch_size = 4
x = tf.placeholder(tf.float32, (None, x_dims))
y = 2*(x**2)
grads = tf.gradients(y,x)
sess = tf.Session()
x_val = np.random.randint(0, 10, (batch_size, x_dims))
y_val, grads_val = sess.run([y, grads], {x:x_val})
print('x = \n', x_val)
print('y = \n', y_val)
print('dy/dx = \n', grads_val[0])
This gives the following output:
x =
[[5 3 7]
[2 2 5]
[7 5 0]
[3 7 6]]
y =
[[50. 18. 98.]
[ 8. 8. 50.]
[98. 50. 0.]
[18. 98. 72.]]
dy/dx =
[[20. 12. 28.]
[ 8. 8. 20.]
[28. 20. 0.]
[12. 28. 24.]]
This is the output I would expect, simply the derivative dy/dx for every element in the batch. I don't see any summing happening. I have seen in other examples that this operation is followed by dividing by the batch size to account for tf.gradients() summing the gradients over the batch (see here: https://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html). Why is this necessary?
I am using Tensorflow 1.6 and Python 3.