Loss depends on the gradients of the outputs using pytorch

Question

I would like calculate the loss in the following form:

where u_bc and \hat{u}_bc are the predicted and exact values of x_1, u''_r and \hat{u}''_r are the predicted and exact second derivatives of the output from x_2. x_1 and x_2 are different samples.

I am trying to implement in the following way:

# forward pass to calculate the first loss component
u_bc_pred = self.forward(self.X_u)
loss_bcs = self.loss_fnc(u_bc_pred, self.Y_u)

# the second loss component involves the second derivative of output
u_r_pred = self.forward(self.X_r)
u_x = torch.autograd.grad(u_r_pred, self.X_r, torch.ones_like(u_r_pred), retain_graph=True, create_graph=True)[0] / self.sigma_x
u_xx = torch.autograd.grad(u_x, self.X_r, torch.ones_like(u_x), retain_graph=True, create_graph=False)[0] / self.sigma_x
loss_res = self.loss_fnc(u_xx, self.Y_r)

# Total loss
loss = loss_res + loss_bcs
self.loss_log.append(loss.data)

# backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()

My code does not seem to decrease the second loss term involving u_xx. Is there anything wrong with the way I wrote the loss involving solution derivatives? Can someone please help take a look? Thanks a lot!

Edit: The network is supposed to learn a scalar function. hence, it has one unit in and one unit out. You can consider it has layers of the form [1, n, n, 1].

Don't call `forward` explicitly, but rather use `self(x)` for forwarding `x` through the model. — Shai, Dec 05 '22 at 07:23
Thank you for the comment Shai! u_r_pred is of shape [100, 1]. For your reference, X_r and X_u are also [100, 1]. The network has in_features 1 and out_features 1, takes 100 samples in this case. — user123, Dec 05 '22 at 18:27
I think the batch dimension is giving you trouble. By providing the `output_grad` vector of all ones, you actually compute the derivative w.r.t a sum over the batch - not sure this is what you want. Try working with `batch_size=1` and see if you have better results. — Shai, Dec 06 '22 at 06:09
Thank you for the suggestion Shai, that part seems to be correct to me as I tested this on a simple problem. I have changed my code a little bit more and by tuning the training parameters. It is now working correctly, just converging to slow. Let me update my post later. — user123, Dec 06 '22 at 19:11

Loss depends on the gradients of the outputs using pytorch

0 Answers0