I'm interested in Finding the Gradient of Neural Network output with respect to the parameters (weights and biases).
More specifically, assume I have the following Neural Network Structure [6,4,3,1]. The input samples size is 20. What I'm interested in, is finding the gradient of Neural Network output w.r.t the weights (and biases), which if I'm not mistaken, in this case would be 47. In Literature , this gradient is sometimes known as Weight_Jacobian.
I'm using Pytorch version 0.4.0 on Python 3.6 on Jupyter Notebook.
The Code that I have produced is this:
def init_params(layer_sizes, scale=0.1, rs=npr.RandomState(0)):
return [(rs.randn(insize, outsize) * scale, # weight matrix
rs.randn(outsize) * scale) # bias vector
for insize, outsize in
zip(layer_sizes[:-1],layer_sizes[1:])]
layers = [6, 4, 3, 1]
w = init_params(layers)
first_layer_w = Variable(torch.tensor(w[0][0],requires_grad=True))
first_layer_bias = Variable(torch.tensor(w[0][1],requires_grad=True))
second_layer_w = Variable(torch.tensor(w[1][0],requires_grad=True))
second_layer_bias = Variable(torch.tensor(w[1][1],requires_grad=True))
third_layer_w = Variable(torch.tensor(w[2][0],requires_grad=True))
third_layer_bias = Variable(torch.tensor(w[2][1],requires_grad=True))
X = Variable(torch.tensor(X_batch),requires_grad=True)
output=torch.tanh(torch.mm(torch.tanh(torch.mm(torch.tanh(torch.mm(X,first_layer_w)+first_layer_bias),second_layer_w)+second_layer_bias),third_layer_w)+third_layer_bias)
output.backward()
As it is obvious from the code, I'm using hyperbolic tangent as the Non Linearity. The code produces the output vector with length 20. Now, I'm interested in finding the Gradient of this Output vector w.r.t all the weights (all 47 of them). I have read the documentation of Pytorch at here. I have also seen similar questions for example, here. However, I have failed to find the gradient of output vector w.r.t parameters. If I use the Pytorch function backward(), it generates an error as
RuntimeError: grad can be implicitly created only for scalar outputs
My Question is, is there a way to calculate the gradient of output vector w.r.t parameters, which could essentially be represented as a 20*47 matrix as I have the size of output vector to be 20 and size of parameter vector to be 47? If so, how ? Is there anything wrong in my code ? You can take any example of X as long as its dimension is 20*6.