4

I'm trying to use a custom loss function by extending nn.Module, but I can't get past the error

element 0 of variables does not require grad and does not have a grad_fn

Note: my labels are lists of size: num_samples, but each batch will have the same labels throughout the batch, so we shrink labels for the whole batch to be a single label by calling .diag()

My code is as follows and is based on the transfer learning tutorial:

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                scheduler.step()
                model.train(True)  # Set model to training mode
            else:
                model.train(False)  # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for data in dataloaders[phase]:
                # get the inputs
                inputs, labels = data
                inputs = inputs.float()


                # wrap them in Variable
                if use_gpu:
                    inputs = Variable(inputs.cuda())
                    labels = Variable(labels.cuda())
                else:
                    inputs = Variable(inputs)
                    labels = Variable(labels)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                outputs = model(inputs)
                #outputs = nn.functional.sigmoid(outputs).round()
                _, preds = torch.max(outputs, 1)
                label = labels.diag().float()
                preds = preds.float()
                loss = criterion(preds, label)
                # backward + optimize only if in training phase
                if phase == 'train':
                    loss.backward()
                    optimizer.step()

                # statistics
                running_loss += loss.data[0] * inputs.size(0)
                running_corrects += torch.sum(pred == label.data)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

and my loss function is defined below:

class CustLoss(nn.Module):
    def __init__(self):
        super(CustLoss, self).__init__()
    def forward(self, outputs, labels):
        return cust_loss(outputs, labels)

def cust_loss(pred, targets):
    '''preds are arrays of size classes with floats in them'''
    '''targets are arrays of all the classes from the batch'''
    '''we sum the classes from the batch and find the num correct'''
    r = torch.sum(pred == targets)
    return r

Then I run the following to run the model:

model_ft = models.resnet18(pretrained=True)
for param in model_ft.parameters():
    param.requires_grad = False

num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 3)

if use_gpu:
    model_ft = model_ft.cuda()

criterion = CustLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,num_epochs=25)

I tried getting it to work with other loss functions to no avail. I always get the same error when loss.backward() is called.

It was my understanding that I wouldn't need a custom implementation of loss.backward if I extend nn.Module.

Matthew Ciaramitaro
  • 1,184
  • 1
  • 13
  • 27

1 Answers1

3

You are subclassing nn.Module to define a function, in your case Loss function. So, when you compute loss.backward(), it tries to store the gradients in the loss itself, instead of the model and there is no variable in the loss for which to store the gradients. Your loss needs to be a function and not a module. See Extending autograd.

You have two options here -

  1. The easiest one is to directly pass cust_loss function as criterion parameter to train_model.
  2. You can extend torch.autograd.Function to define the custom loss (and if you wish, the backward function as well).

P.S. - It is mentioned that you need to implement the backward of the custom loss functions. This is not always the case. It is required only when your loss function is non-differentiable at some point. But, I do not think so that you’ll need to do that.

layog
  • 4,661
  • 1
  • 28
  • 30