PyTorch RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed

Question

I'm a college student just getting into Deep Learning, trying to create my first backward propagated model.

However, I keep getting the "Trying to backward through the graph a second time, but the saved intermediate results have already been freed." runtime error.

I have seen many others asking the same question here, but their sample code is often too advanced for me, and I don't understand the answers. I have already tried adding convtraining.zero_grad() and loss.sum().backward(retain_graph = True). Neither seem to work.

My own code is the following:

# Import torch.
import torch
import torch.nn as nn

# Define a sample image.
image = torch.tensor([[1, 1, 0, 0, 0],
                     [0, 1, 1, 0, 0],
                     [0, 0, 1, 1, 0],
                     [0, 0, 0, 1, 1],
                     [1, 0, 0, 0, 1]], dtype = torch.float).reshape(1,1,5,5)

# Define a sample kernel.
kernel = torch.tensor([[0,-1, 0],
                       [-1, 1, -1],
                       [0,-1, 0]], dtype = torch.float).reshape(1,1,3,3)

# Define a convolution layer with the TRUE kernel weights assigned.
convolution = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(3, 3), bias=False)
convolution.weight = nn.Parameter(kernel)
# Define another convolution without kernel weights to PREDICT them.
convtraining = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(3, 3), bias=False)

# Run the first layer.
output_true = convolution(image)
# Reshape its output to make it suitable for comparison.
output_true = output_true.reshape((1, 1, 3, 3))

# Run the second model 5 times.
for i in range(5):
    output_prediction = convtraining(image)
    # Calculate the loss by squaring the error.
    loss = (output_prediction - output_true) ** 2
    convtraining.zero_grad()
    # Backward propagation.
    loss.sum().backward()
    # Adjust the kernel weights.
    convtraining.weight.data[:] -= 3e-2 * convtraining.weight.grad
    print(loss)

The strange thing is it worked before and I don't know what changed. Does anyone know what might be going wrong?

score 0 · Answer 1 · answered Apr 13 '21 at 22:35

You have already stated that using retain_graph=True didn't work but when I tried your exact code by just adding retain_graph=True it worked fine. I'm adding inside of for loop to below.

for i in range(5):
    output_prediction = convtraining(image)
    # Calculate the loss by squaring the error.
    loss = (output_prediction - output_true) ** 2
    convtraining.zero_grad()
    # Backward propagation.
    loss.sum().backward(retain_graph=True)

    # Adjust the kernel weights.
    convtraining.weight.data[:] -= 3e-2 * convtraining.weight.grad
    print(loss)

Weights of convtraining become this:

Parameter containing:
tensor([[[[-0.2317, -0.6439, -0.5469],
          [-0.6850,  0.1293, -0.4237],
          [-0.1236, -0.3529,  0.0662]]]], requires_grad=True)

I also checked code in debug mode line by line. It was giving reported error in second epoch but it is gone after adding retain_graph=True. If executing code is whole problem this seems working.

score 0 · Answer 2 · answered Apr 13 '21 at 22:46

When you call backward(), multiple successive calculations are done (like derivations). The intermediary results from those calculations are not kept in memory (they are deleted). Only the final results are kept for you to use.

Unfortunately, running another pass of backward propagation requires those intermediary results from the previous backward passes. In order to ensure that those needed results are kept for when you want to run another backward pass, you use retain_graph=True by doing :

loss.sum().backward(retain_graph=True)

instead of

loss.sum().backward()

So let's say that you want to do 3 backward passes, you do this :

loss.sum().backward(retain_graph=True)
loss.sum().backward(retain_graph=True)
loss.sum().backward()

Your code is working normally after adding retain_graph=True in the loop and adding an extra step at the end without retain_graph=True.

PyTorch RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed

2 Answers2