4

I'm interested in Finding the Gradient of Neural Network output with respect to the parameters (weights and biases).

More specifically, assume I have the following Neural Network Structure [6,4,3,1]. The input samples size is 20. What I'm interested in, is finding the gradient of Neural Network output w.r.t the weights (and biases), which if I'm not mistaken, in this case would be 47. In Literature , this gradient is sometimes known as Weight_Jacobian.

I'm using Pytorch version 0.4.0 on Python 3.6 on Jupyter Notebook.

The Code that I have produced is this:

def init_params(layer_sizes, scale=0.1, rs=npr.RandomState(0)):
    return [(rs.randn(insize, outsize) * scale,   # weight matrix
                 rs.randn(outsize) * scale)           # bias vector
                 for insize, outsize in 
                 zip(layer_sizes[:-1],layer_sizes[1:])]
layers = [6, 4, 3, 1]
w = init_params(layers)
first_layer_w = Variable(torch.tensor(w[0][0],requires_grad=True))
first_layer_bias = Variable(torch.tensor(w[0][1],requires_grad=True))
second_layer_w = Variable(torch.tensor(w[1][0],requires_grad=True))
second_layer_bias = Variable(torch.tensor(w[1][1],requires_grad=True))
third_layer_w = Variable(torch.tensor(w[2][0],requires_grad=True))
third_layer_bias = Variable(torch.tensor(w[2][1],requires_grad=True))
X = Variable(torch.tensor(X_batch),requires_grad=True)
output=torch.tanh(torch.mm(torch.tanh(torch.mm(torch.tanh(torch.mm(X,first_layer_w)+first_layer_bias),second_layer_w)+second_layer_bias),third_layer_w)+third_layer_bias)
output.backward()

As it is obvious from the code, I'm using hyperbolic tangent as the Non Linearity. The code produces the output vector with length 20. Now, I'm interested in finding the Gradient of this Output vector w.r.t all the weights (all 47 of them). I have read the documentation of Pytorch at here. I have also seen similar questions for example, here. However, I have failed to find the gradient of output vector w.r.t parameters. If I use the Pytorch function backward(), it generates an error as

RuntimeError: grad can be implicitly created only for scalar outputs

My Question is, is there a way to calculate the gradient of output vector w.r.t parameters, which could essentially be represented as a 20*47 matrix as I have the size of output vector to be 20 and size of parameter vector to be 47? If so, how ? Is there anything wrong in my code ? You can take any example of X as long as its dimension is 20*6.

MBT
  • 21,733
  • 19
  • 84
  • 102
Sibghat Khan
  • 147
  • 1
  • 9

3 Answers3

0

You're trying to compute a Jacobian of a function, while PyTorch is expecting you to compute vector-Jacobian products. You can see an in-depth discussion of computing Jacobians with PyTorch here.

You have two options. Your first option is to use JAX or autograd and use the jacobian() function. Your second option is to stick with Pytorch and compute 20 vector-jacobian products, by calling backwards(vec) 20 times, where vec is a length-20 one-hot vector where the index of the component which is one ranges from 0 to 19. If this is confusing, I recommend reading the autodiff cookbook from the JAX tutorials.

Nick McGreivy
  • 611
  • 5
  • 10
0

You need to use torch.autograd.functional.jacobian() as other answers have stated, but to compute this with respect to the parameters and not the inputs you need to write a custom function that takes network parameters as inputs, and returns the network output given those parameters. The Jacobian of this function is what you are looking for. There is some nice discussion on this problem here

Tom Ryan
  • 397
  • 2
  • 6
  • 26
  • "takes network parameters as inputs" this is exactly what `inputs` in my answer refers to... – iacob May 16 '23 at 21:15
  • You copied my answer and added what should just have been a [comment](https://stackoverflow.com/help/privileges/comment), not a new answer. This isn't how this site works, if you think an existing answer can be improved submit an edit or post a comment on it (like [here](https://stackoverflow.com/questions/50175711/pytorch-gradient-of-output-w-r-t-parameters/76266263?noredirect=1#comment134499646_66885946)). – iacob May 17 '23 at 12:07
-1

The matrix of partial derivatives of a function with respect to its parameters is known as the Jacobian, and can be computed in PyTorch with:

torch.autograd.functional.jacobian(func, inputs)
iacob
  • 20,084
  • 6
  • 92
  • 119