Why does Tensorflow Reshape tf.reshape() break the flow of gradients?

Question

I am creating a tf.Variable() and then create a simple function using that variable, then I flatten the original variable using tf.reshape() and then I take the tf.gradients() between the function and the flattened variable. Why does that return [None].

var = tf.Variable(np.ones((5,5)), dtype = tf.float32)
f = tf.reduce_sum(tf.reduce_sum(tf.square(var)))
var_f = tf.reshape(var, [-1])
print tf.gradients(f,var_f)

The above codeblock when executed returns [None]. Is this a bug? Please Help!

You have to run it in a `session` as shown in the [basic TF tutorials](https://www.tensorflow.org/versions/r0.12/get_started/basic_usage#launching_the_graph_in_a_session). — jkschin, Jun 30 '17 at 05:24
@jkschin That's not true in this case. The code is not executing anything in the computation graph, it's only defining the computation graph. Try it for yourself - the snippet works identically with and without a session. — kdbanman, Jul 01 '17 at 04:49

score 6 · Answer 1 · edited Nov 18 '21 at 20:36

6

You are finding derivative of f with respect to var_f, but f is not a function of var_f but var instead. Thats why you are getting [None]. Now if you change the code to:

 var = tf.Variable(np.ones((5,5)), dtype = tf.float32)
 var_f = tf.reshape(var, [-1])
 f = tf.reduce_sum(tf.reduce_sum(tf.square(var_f)))
 grad = tf.gradients(f,var_f)
 print(grad)

your gradients will be defined:

tf.Tensor 'gradients_28/Square_32_grad/mul_1:0' shape=(25,) dtype=float32>

The visualization of the graphs for the following code is given below:

 var = tf.Variable(np.ones((5,5)), dtype = tf.float32, name='var')
 f = tf.reduce_sum(tf.reduce_sum(tf.square(var)), name='f')
 var_f = tf.reshape(var, [-1], name='var_f')
 grad_1 = tf.gradients(f,var_f, name='grad_1')
 grad_2 = tf.gradients(f,var, name='grad_2')

The derivative of grad_1 is not defined, while for grad_2 it's defined. The back-propagation graph (gradient graphs) of the two gradients are shown.

edited Nov 18 '21 at 20:36

Engineero

12,340
5
53
75

answered Jun 30 '17 at 12:48

Vijay Mariappan

16,921
3
40
59

This answer is simple and good, but I'm still surprised that the gradients don't automatically appear on the reshaped variable. A tensor's shape is stored on an object that wraps tensor's underlying arrays and other data. When `.reshape` is called, the underlying arrays and (some? all?) other data are reused or maybe recomputed. That's why reshape is fast. So I think it's reasonable to expect a function's tensor dependencies to be tracked with data that gets reused (or at least recomputed) through a reshape operation. But evidently not! I'd really like to know why. – kdbanman Jul 01 '17 at 05:06
Its a interesting question and this is my understanding: You are right with reshape(), the data is reused (not recomputed) but still `var_f` will be a different node in the graph. So when you call `tf.gradients()`, it builds a back-propagation graph, and in this case the node `f` doesn't find a path to node `var_f`. – Vijay Mariappan Jul 01 '17 at 05:27
This is a concise way of putting it: "*node `f` doesn't find a path to node `var_f`*". Thank you. As a programmer consuming the reshape interface, I'm still surprised that that the path can't be found. I would like to know why the tensorflow devs decided to create a new node in the computation graph for a reshape. And on top of that, I'd like to know why that new `var_f` node isn't given an edge to `f`. – kdbanman Jul 01 '17 at 06:45
I was talking about backpropagation (gradient) graph that doesnt have a dependency path to the node in question, because its build to implement chain rule. – Vijay Mariappan Jul 01 '17 at 14:27

Why does Tensorflow Reshape tf.reshape() break the flow of gradients?

1 Answers1

Linked