8

I am trying to compute a loss on the jacobian of the network (i.e. to perform double backprop), and I get the following error: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

I can't find the inplace operation in my code, so I don't know which line to fix.

*The error occurs in the last line:

loss3.backward()

inputs_reg = Variable(data, requires_grad=True)
output_reg = self.model.forward(inputs_reg)

num_classes = output.size()[1]
jacobian_list = []
grad_output = torch.zeros(*output_reg.size())

if inputs_reg.is_cuda:
    grad_output = grad_output.cuda()
    jacobian_list = jacobian.cuda()

for i in range(10):

    zero_gradients(inputs_reg)
    grad_output.zero_()
    grad_output[:, i] = 1
    jacobian_list.append(torch.autograd.grad(outputs=output_reg,
                                      inputs=inputs_reg,
                                      grad_outputs=grad_output,
                                      only_inputs=True,
                                      retain_graph=True,
                                      create_graph=True)[0])


jacobian = torch.stack(jacobian_list, dim=0)
loss3 = jacobian.norm()
loss3.backward()
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
Einav
  • 91
  • 1
  • 1
  • 4
  • `grad_output.zero_()` seems like an in-place operation. you might have in-place operations in `self.model`. – Shai Dec 09 '18 at 11:31
  • `grad_output.zero_()` is the inplace operation. In PyTorch the inplace operations end with an underscore. I think you wanted to write `grad_output.zero_grad() – kHarshit Dec 09 '18 at 11:32
  • I need to zero grad_output before I set the new column (corresponding with the output that I want the gradient to be calculated for) to be ones. so I changed grad_output.zero_() to grad_output[:,i-1] = 0 and it did not help. – Einav Dec 09 '18 at 12:53
  • Actually what I described above is replacing one inplace operation with another. – Einav Dec 09 '18 at 13:25

5 Answers5

8

You can make use of set_detect_anomaly function available in autograd package to exactly find which line is responsible for the error.

Here is the link which describes the same problem and a solution using the abovementioned function.

yottabytt
  • 2,280
  • 1
  • 18
  • 14
6

grad_output.zero_() is in-place and so is grad_output[:, i-1] = 0. In-place means "modify a tensor instead of returning a new one, which has the modifications applied". An example solution which is not in-place is torch.where. An example use to zero out the 1st column

import torch
t = torch.randn(3, 3)
ixs = torch.arange(3, dtype=torch.int64)
zeroed = torch.where(ixs[None, :] == 1, torch.tensor(0.), t)

zeroed
tensor([[-0.6616,  0.0000,  0.7329],
        [ 0.8961,  0.0000, -0.1978],
        [ 0.0798,  0.0000, -1.2041]])

t
tensor([[-0.6616, -1.6422,  0.7329],
        [ 0.8961, -0.9623, -0.1978],
        [ 0.0798, -0.7733, -1.2041]])

Notice how t retains the values it had before and zeroed has the values you want.

Jatentaki
  • 11,804
  • 4
  • 41
  • 37
0

Thanks! I replaced the problematic code of the inplace operation in grad_output with:

            inputs_reg = Variable(data, requires_grad=True)
            output_reg = self.model.forward(inputs_reg)
            num_classes = output.size()[1]

            jacobian_list = []
            grad_output = torch.zeros(*output_reg.size())

            if inputs_reg.is_cuda:
                grad_output = grad_output.cuda()

            for i in range(5):
                zero_gradients(inputs_reg)

                grad_output_curr = grad_output.clone()
                grad_output_curr[:, i] = 1
                jacobian_list.append(torch.autograd.grad(outputs=output_reg,
                                                         inputs=inputs_reg,
                                                         grad_outputs=grad_output_curr,
                                                         only_inputs=True,
                                                         retain_graph=True,
                                                         create_graph=True)[0])

            jacobian = torch.stack(jacobian_list, dim=0)
            loss3 = jacobian.norm()
            loss3.backward()
Einav
  • 91
  • 1
  • 1
  • 4
  • Please note the `grad_output_curr[:, i] = 1` line is still an in-place operation and may (or may not) cause trouble further down the line. – Jatentaki Dec 09 '18 at 13:32
0

I hope your problem got solved. I had this problem and solutions like using function clone() did not work for me. But when I installed pytorch version 1.4, it solved.
I think this problem is kind of bug in step() function. Some weird thing is this bug happen when you use pytorch version 1.5 but it's not in v1.4.
You can see all released versions of pytorch in this link.

Omid Khalaf Beigi
  • 187
  • 1
  • 2
  • 12
0

I met this error when I was doing the PPO (Proximal Policy Optimization). I solve this problem by defining a target network and a main network. The target network at the beginning has the same parameter values with the main network. During the training, the target network parameters are assigned to the main network every constant time steps. The details can be found in the code: https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_colab.ipynb

Mingming Qiu
  • 333
  • 4
  • 9