torch.nn.DataParallel with torch.autograd.grad in loss function fails

Question

I have a neural network model that represents the surface of an object. For this to work, the gradients are calculated in the loss function (because for example it's a property of signed distance fields (sdfs) that the gradient is always unit length). The loss function is the one from SIREN for sdfs and defined as

def sdf(model_output, gt):
    gt_sdf = gt['sdf']
    gt_normals = gt['normals']

    coords = model_output['model_in']
    pred_sdf = model_output['model_out'].to(torch.float32)

    gradient = diff_operators.gradient(pred_sdf, coords)

    # Wherever boundary_values is not equal to zero, we interpret it as a boundary constraint.
    sdf_constraint = torch.where(gt_sdf != -1, pred_sdf, torch.zeros_like(pred_sdf))
    inter_constraint = torch.where(gt_sdf != -1, torch.zeros_like(pred_sdf), torch.exp(-1e2 * torch.abs(pred_sdf)))
    normal_constraint = torch.where(gt_sdf != -1, 1 - F.cosine_similarity(gradient, gt_normals, dim=-1)[..., None],
                                    torch.zeros_like(gradient[..., :1]))
    grad_constraint = torch.abs(gradient.norm(dim=-1) - 1)

    return {'sdf': torch.abs(sdf_constraint).mean() * 3e3,
            'inter': inter_constraint.mean() * 1e2,
            'normal_constraint': normal_constraint.mean() * 1e2,
            'grad_constraint': grad_constraint.mean() * 5e1}

and the gradient calculation uses torch.autograd.grad:

def gradient(y, x, grad_outputs=None):
    if grad_outputs is None:
        grad_outputs = torch.ones_like(y)
    grad = torch.autograd.grad(y, [x], grad_outputs=grad_outputs, create_graph=True)[0]
    return grad

Now I wanted to parallelise the training by implementing torch.nn.DataParallel. I get the following error:

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Is it possible to use torch.nn.DataParallel with gradient calculation in the loss function and what do I need to change to make it work?

Did you find a solution for this? – Roy Apr 19 '22 at 22:13 — Roy, Apr 19 '22 at 22:13

score 0 · Answer 1 · answered Aug 04 '21 at 10:21

Looking at the documentation of nn.parallel.DistributedDataParallel:

This module doesn’t work with torch.autograd.grad() (i.e. it will only work if gradients are to be accumulated in .grad attributes of parameters).

It also recommends to use torch.distributed.autograd.backward and torch.distributed.optim.DistributedOptimizer.

Also in the documentation of torch.distributed it recommends using gloo backend:

Please notice that currently the only backend where all the functions are guaranteed to work is gloo.

Thanks for your answer! But I am not using ```nn.parallel.DistributedDataParallel``` exactly for that reason, but ```nn.parallel.DataParallel```. — Elyora, Aug 04 '21 at 10:45

torch.nn.DataParallel with torch.autograd.grad in loss function fails

1 Answers1