2

I have a neural network model that represents the surface of an object. For this to work, the gradients are calculated in the loss function (because for example it's a property of signed distance fields (sdfs) that the gradient is always unit length). The loss function is the one from SIREN for sdfs and defined as

def sdf(model_output, gt):
    gt_sdf = gt['sdf']
    gt_normals = gt['normals']

    coords = model_output['model_in']
    pred_sdf = model_output['model_out'].to(torch.float32)

    gradient = diff_operators.gradient(pred_sdf, coords)

    # Wherever boundary_values is not equal to zero, we interpret it as a boundary constraint.
    sdf_constraint = torch.where(gt_sdf != -1, pred_sdf, torch.zeros_like(pred_sdf))
    inter_constraint = torch.where(gt_sdf != -1, torch.zeros_like(pred_sdf), torch.exp(-1e2 * torch.abs(pred_sdf)))
    normal_constraint = torch.where(gt_sdf != -1, 1 - F.cosine_similarity(gradient, gt_normals, dim=-1)[..., None],
                                    torch.zeros_like(gradient[..., :1]))
    grad_constraint = torch.abs(gradient.norm(dim=-1) - 1)

    return {'sdf': torch.abs(sdf_constraint).mean() * 3e3,
            'inter': inter_constraint.mean() * 1e2,
            'normal_constraint': normal_constraint.mean() * 1e2,
            'grad_constraint': grad_constraint.mean() * 5e1}

and the gradient calculation uses torch.autograd.grad:

def gradient(y, x, grad_outputs=None):
    if grad_outputs is None:
        grad_outputs = torch.ones_like(y)
    grad = torch.autograd.grad(y, [x], grad_outputs=grad_outputs, create_graph=True)[0]
    return grad

Now I wanted to parallelise the training by implementing torch.nn.DataParallel. I get the following error:

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Is it possible to use torch.nn.DataParallel with gradient calculation in the loss function and what do I need to change to make it work?

Elyora
  • 21
  • 3

1 Answers1

0

Looking at the documentation of nn.parallel.DistributedDataParallel:

This module doesn’t work with torch.autograd.grad() (i.e. it will only work if gradients are to be accumulated in .grad attributes of parameters).

It also recommends to use torch.distributed.autograd.backward and torch.distributed.optim.DistributedOptimizer.

Also in the documentation of torch.distributed it recommends using gloo backend:

Please notice that currently the only backend where all the functions are guaranteed to work is gloo.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • Thanks for your answer! But I am not using ```nn.parallel.DistributedDataParallel``` exactly for that reason, but ```nn.parallel.DataParallel```. – Elyora Aug 04 '21 at 10:45