0

I have a model, for which i need to compute the gradients of output w.r.t the model's input. But I want to apply some custom gradients for some of the nonlinearity functions applied on some of the model's layers. So i tried the idea explained here, which computes the nonlinear rectifier (RELU) in the forward pass but modifies the gradients of Relu in the backward pass. I added the following two classes:

  • The helper class that allows us to replace a nonlinearity with an Op that has the same output, but a custom gradient
class ModifiedBackprop(object):
  def __init__(self, nonlinearity):
      self.nonlinearity = nonlinearity
      self.ops = {}  # memoizes an OpFromGraph instance per tensor type

  def __call__(self, x):
      # OpFromGraph is oblique to Theano optimizations, so we need to move
      # things to GPU ourselves if needed.
      if theano.sandbox.cuda.cuda_enabled:
          maybe_to_gpu = theano.sandbox.cuda.as_cuda_ndarray_variable
      else:
          maybe_to_gpu = lambda x: x
      # We move the input to GPU if needed.
      x = maybe_to_gpu(x)
      # We note the tensor type of the input variable to the nonlinearity
      # (mainly dimensionality and dtype); we need to create a fitting Op.
      tensor_type = x.type
      # If we did not create a suitable Op yet, this is the time to do so.
      if tensor_type not in self.ops:
          # For the graph, we create an input variable of the correct type:
          inp = tensor_type()
          # We pass it through the nonlinearity (and move to GPU if needed).
          outp = maybe_to_gpu(self.nonlinearity(inp))
          # Then we fix the forward expression...
          op = theano.OpFromGraph([inp], [outp])
          # ...and replace the gradient with our own (defined in a subclass).
          op.grad = self.grad
          # Finally, we memoize the new Op
          self.ops[tensor_type] = op
      # And apply the memoized Op to the input we got.
      return self.ops[tensor_type](x)
  • The subclass that does guided backpropagation through a nonlinearity:
class GuidedBackprop(ModifiedBackprop):
    def grad(self, inputs, out_grads):
        (inp,) = inputs
        (grd,) = out_grads
        dtype = inp.dtype
        print('It works')
        return (grd * (inp > 0).astype(dtype) * (grd > 0).astype(dtype),)
  • Then i used them in my code as follows:
import lasagne as nn
model_in = T.tensor3()
# model_in = net['input'].input_var
nn.layers.set_all_param_values(net['l_out'], model['param_values'])

relu = nn.nonlinearities.rectify 
relu_layers = [layer for layer in 
          nn.layers.get_all_layers(net['l_out']) if getattr(layer,
          'nonlinearity', None) is relu] 
modded_relu = GuidedBackprop(relu)

for layer in relu_layers:
    layer.nonlinearity = modded_relu   

prop = nn.layers.get_output(
    net['l_out'], model_in, deterministic=True)

for sample in range(ini, batch_len):                                
    model_out = prop[sample, 'z']   # get prop for label 'z'
    gradients = theano.gradient.jacobian(model_out, wrt=model_in) 
    # gradients = theano.grad(model_out, wrt=model_in) 
    get_gradients = theano.function(inputs=[model_in],
                                        outputs=gradients)
    grads = get_gradients(X_batch) # gradient dimension: X_batch == model_in(64, 20, 32) 
    grads = np.array(grads)
    grads = grads[sample]

Now when i run the code, it works without any error, and the shape of the output is also correct. But that's because it executes the default theano.grad function and not the one supposed to override it. In other words, the grad() function in the class GuidedBackprop never been invoked.

  1. I can't understand what is the issue?
  2. is there's a solution?
  3. If this is an unresolved issue, is there's an implementation for a Theano Op that can achieve such a functionality or some other way to override gradient for specific nonlinearity functions applied on some of the model's layers?

1 Answers1

0

Are you try to set it back the value of model output into model layer input, all gradients calculation

group_1_ShoryuKen_Left = tf.constant([ 0,0,0,0,0,1,0,0,0,0,0,0, 0,0,0,0,0,1,0,1,0,0,0,0, 0,0,0,0,0,0,0,1,0,0,0,0, 0,0,0,0,0,0,0,0,0,1,0,0 ], shape=(1, 1, 48), dtype=tf.float32)

## layer_2 = tf.keras.layers.Dense(256, kernel_initializer=tf.constant_initializer(1.))
layer_2 = tf.keras.layers.LSTM(32, kernel_initializer=tf.constant_initializer(1.))
b_out =  layer_2(group_1_ShoryuKen_Left)
layer_2.set_weights(layer_1.get_weights())

Gradient values from pictures

General Grievance
  • 4,555
  • 31
  • 31
  • 45
  • I need to get the gradients of each output corrspond to one of the output classes "not only the max" w.r.t the input. But while getting gradients, for the some of the layers (e.g., nonlinearity layers) i need to apply some constraints on the gradients calculation. For example, only propagate back the positive signals, similar to guided backprop. the problem not in the custom function that i'll apply, the problem is that i can't oveeride the grad function for the layers,that is the problem. – HATEM EL-AZAB Mar 02 '22 at 02:35
  • I understand that the maximum value can create new gradients and output for next layer there is no need to set it back the gradients, can you explain reasons to do that !? – Jirayu Kaewprateep Mar 02 '22 at 10:40
  • Gradients on max, help shows what the model likes to see in order to obtain the target class, in addition it help shows what the model likes to see in order to minimize the possibility to obtain the other classes. a more clear view can be obtained either by getting the gradients back starting from the one layer before the softmax, or by starting from the last layer (i.e., softmax layer) and do your calculations based on what you get. The later is the more tedious, since you need to get the gradients from each class output backward. – HATEM EL-AZAB Mar 03 '22 at 00:23
  • The best weights output is also provides the target results !? – Jirayu Kaewprateep Mar 04 '22 at 14:04