This is a question about PyTorch autograd.grad and the backward function specifically.
I have two tensors a
, b
which are optimized over (i.e. require gradients).
I define loss1, loss2 = f(a,b), g(a,b)
. Although these are two separate functions f
and g
, for computational efficiency reasons, I have to compute both of them together as fg(a,b)
which returns a tuple (loss1, loss2)
.
I need to use opt_a
and opt_b
(optimizers) to step a
and b
with the following gradients:
a.grad
should equal d (loss1
)/ d (a
)
b.grad
should equal d (loss2
)/ d (b
)
How can I achieve these gradients? I know I can run autograd.grad(loss1, a)
and autograd.grad(loss2, b)
to get the true gradients and set them to *.grad
manually, but I want to use the backward method on the loss1
and loss2
.
I want to use the backward method because it is concise code when in my case, a and b are actually the lists of parameters of two neural networks (I don't want to be manually setting param.grad = ... for param in model1.parameters()
).
Is there a clean way to do this with .backward()
?
My Attempt
I have tried multiple ordered variants of the following (but none of them work because the gradients add for one variable at least):
loss1, loss2 = fg(...)
opt1.zero_grad()
loss1.backward(retain_graph=True)
opt1.step()
opt2.zero_grad()
loss2.backward()
opt2.step()
Different orders of this result in either an accumulation of gradients (d (loss1+loss2
)/ d (a
)) or they result in one optimizer stepping the value of a, and then I can't run b.backward()
because of an in-place operation change.