How to assign custom gradient to TensorFlow op with multiple inputs

Question

I'm trying to use TensorFlow's @tf.custom_gradient functionality to assign a custom gradient to a function with multiple inputs. I can put together a working setup for only one input, but not for two or more.

I've based my code on TensorFlow's custom_gradient documentation, which works just fine for one input, as in this example:

import tensorflow as tf
import os

# Suppress Tensorflow startup info
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

# Custom gradient decorator on a function,
# as described in documentation
@tf.custom_gradient
def my_identity(x):

    # The custom gradient
    def grad(dy):
        return dy

    # Return the result AND the gradient
    return tf.identity(x), grad

# Make a variable, run it through the custom op
x = tf.get_variable('x', initializer=1.)
y = my_identity(x)

# Calculate loss, make an optimizer, train the variable
loss = tf.abs(y)
opt = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train = opt.minimize(loss)

# Start a TensorFlow session, initialize variables, train
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run(train)

This example runs silently, then closes. No issues, no errors. The variable optimizes as expected. However, in my application, I need to do such a calculation with multiple inputs, so something of this form:

@tf.custom_gradient
def my_identity(x, z):

    def grad(dy):
        return dy

    return tf.identity(x*z), grad

Running this in place of the example (and adding another variable input to the call of my_identify) results in the following error output. Best as I can tell, the last parts of the error are from the dynamic generation of the op -- the information format matches the C++ formatting required in the op establishment (though that's about all I know about it).

Traceback (most recent call last):
  File "testing.py", line 27, in <module>
    train = opt.minimize(loss)
  File "/usr/lib/python3/dist-packages/tensorflow/python/training/optimizer.py", line 400, in minimize
    grad_loss=grad_loss)
  File "/usr/lib/python3/dist-packages/tensorflow/python/training/optimizer.py", line 519, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/lib/python3/dist-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
    gate_gradients, aggregation_method, stop_gradients)
  File "/usr/lib/python3/dist-packages/tensorflow/python/ops/gradients_impl.py", line 821, in _GradientsHelper
    _VerifyGeneratedGradients(in_grads, op)
  File "/usr/lib/python3/dist-packages/tensorflow/python/ops/gradients_impl.py", line 323, in _VerifyGeneratedGradients
    "inputs %d" % (len(grads), op.node_def, len(op.inputs)))
ValueError: Num gradients 2 generated for op name: "IdentityN"
op: "IdentityN"
input: "Identity"
input: "x/read"
input: "y/read"
attr {
  key: "T"
  value {
    list {
      type: DT_FLOAT
      type: DT_FLOAT
      type: DT_FLOAT
    }
  }
}
attr {
  key: "_gradient_op_type"
  value {
    s: "CustomGradient-9"
  }
}
 do not match num inputs 3

Based on other custom gradient options, I surmised that the issue was a lack of supplied gradient for the second input argument. So, I changed my function to this:

@tf.custom_gradient
def my_identity(x, z):

    def grad(dy):
        return dy

    return tf.identity(x*z), grad, grad

This results in the following more familiar error:

Traceback (most recent call last):
  File "testing.py", line 22, in <module>
    y = my_identity(x, z)
  File "/usr/lib/python3/dist-packages/tensorflow/python/ops/custom_gradient.py", line 111, in decorated
    return _graph_mode_decorator(f, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/tensorflow/python/ops/custom_gradient.py", line 132, in _graph_mode_decorator
    result, grad_fn = f(*args)
ValueError: too many values to unpack (expected 2)

The @custom_gradient decorator is only identifying the last returned element as a gradient. So, I tried putting the two gradients into a tuple as (grad, grad) such that there would only be "two" outputs for the function. TensorFlow rejected this too, this time because it can't call a tuple like it would a Tensor -- entirely reasonable, in hindsight.

I've fussed around with the example some more, but to no avail. No matter what I try, I can't get the custom-defined gradient to deal with multiple inputs. I'm hoping that somebody with more knowledge than I regarding custom ops and gradients will have a better idea on this -- thanks in advance for the help!

YZ Ye · Accepted Answer · 2019-04-29T12:07:39.890

4

If we use multiple variables as input, the number of gradients return from "grad" function should be equals to number of input variables, though we maybe don't care about some of them.

For example:

@tf.custom_gradient
def my_multiple(x,z):

def grad(dy):
    # return two gradients, one for 'x' and one for 'z'
    return (dy*z, dy*x)

return tf.identity(x*z), grad

Note that the second output of "my_multiple" is a function, not a gradient tensor.

edited Apr 29 '19 at 12:07

answered Feb 28 '19 at 01:48

YZ Ye

56
3

As input to `my_multiple` you have `(x,z)`.. but the gradient returns `(dy*z, dy*x)`, where you have reversed the order. Is that okay for tensorflow? How does it know which term belongs where..? – zwep Dec 03 '19 at 14:07
@zwep the order is not reversed. The gradient of `(x*z)` with respect to `x` is `z`, and vice-versa. – wimi Dec 06 '19 at 13:50

gab · Answer 2 · 2019-08-07T13:04:20.997

I ran into a similar problem some time ago and I think the documentation is not very clear on this. In general, the code should be something like:

@tf.custom_gradient
def custom_operation(x, y, scope='custom_op'):

    # define the gradient
    def grad(g):
        return g, g

    # define the forward pass (a multiplication, in this example)
    with tf.variable_scope(scope):
        forward_pass = x * y

    return forward_pass, grad

In practice, your internal grad function should return the gradient N times, where N is the number of argument that the custom_operation takes as input (apart from the scope). By using two inputs (x and y), the grad function must return the gradients twice (once for x and once for y). In general, you could also make the grad() function return g1 != g2 instead of g for both the inputs. So, in your example it becomes:

@tf.custom_gradient
def my_identity(x, z):

    def grad(dy):
        return dy, dy

    return tf.identity(x*z), grad

How to assign custom gradient to TensorFlow op with multiple inputs

2 Answers2