When using a torch.nn.BCELoss()
on two arguments that are both results of some earlier computation, I get some curious error, which this question is about:
RuntimeError: the derivative for 'target' is not implemented
The MCVE is as follows:
import torch
import torch.nn.functional as F
net1 = torch.nn.Linear(1,1)
net2 = torch.nn.Linear(1,1)
loss_fcn = torch.nn.BCELoss()
x = torch.zeros((1,1))
y = F.sigmoid(net1(x)) #make sure y is in range (0,1)
z = F.sigmoid(net2(y)) #make sure z is in range (0,1)
loss = loss_fcn(z, y) #works if we replace y with y.detach()
loss.backward()
It turns out if we call .detach()
on y
the error disappears. But this results in a different computation, now in the .backward()
-pass, the gradients with respect to the second argument of the BCELoss
will not be computed.
Can anyone explain what I'm doing wrong in this case? As far as I know all pytorch modules in torch.nn
should support computing gradients. And this error message seems to tell me that the derivative is not implemented for y
, which is somehow strange, as you can compute the gradient of y
, but not of y.detach()
which seems to be contradictory.