4

I am trying to use nn.BCEWithLogitsLoss() for model which initially used nn.CrossEntropyLoss(). However, after doing some changes to the training function to accommodate the nn.BCEWithLogitsLoss() loss function the model accuracy values are shown as more than 1. Please find the code below.

# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

data_dir = '/kaggle/input/catsndogsorg/hymenoptera_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

#############

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs - 1}')
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device).unsqueeze(1)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    #print(outputs, labels)
                    loss = criterion(outputs, labels.float())
                    print(loss)
                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
    print(f'Best val Acc: {best_acc:4f}')

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

EDIT:Model pipeline

model_ft = models.resnet18(weights='ResNet18_Weights.DEFAULT')
num_ftrs = model_ft.fc.in_features
# Here the size of each output sample is set to 2.
# Alternatively, it can be generalized to nn.Linear(num_ftrs, len(class_names)).
model_ft.fc = nn.Linear(num_ftrs, 1)

model_ft = model_ft.to(device)

criterion = nn.BCEWithLogitsLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

model_ft = train_model(model_ft, criterion, optimizer_ft, 
     exp_lr_scheduler,num_epochs=25)

The outputs of training loop:

outputs shape:  torch.Size([4, 1])
labels shape:  torch.Size([4, 1])

logits: tensor(0.3511,grad_fn<BinaryCrossEntropyWithLogitsBackward0>)

train Loss: 1.0000 Acc: 2.0164
val Loss: 1.0000 Acc: 1.8105

Would anyone be able to help me in this matter please.

Thanks & Best Regards AMJS

  • 1
    Are you doing multi-class or multi-label classification? – geometrikal Apr 03 '23 at 04:01
  • Hi, Binary classification please Thanks – Alain Michael Janith Schroter Apr 03 '23 at 04:22
  • 1
    Can you make sure that `preds` and `labels` have the same shapes? – Ivan Apr 03 '23 at 06:58
  • 1
    I am concerned about the `output` variable. `output` variable is supposed to have the same shape as the `label`. But in this case, the `output` shape is [4, 2], and the shape of the `label` is [4, 1] if the batch size is 4. – Druwa Apr 03 '23 at 11:15
  • Possibly `dataset_sizes` is incorrect. You could try doing your own counting in each phase loop to make sure you have the correct number of items. And in any case you are not guaranteed to use every item in the dataset on every epoch depending on the dataloader batching settings – DerekG Apr 03 '23 at 13:15
  • Hi, thanks for the reply. Both `outputs` variable and `labels` variable have the same shape. Thanks – Alain Michael Janith Schroter Apr 04 '23 at 12:56
  • 1
    @AlainMichaelJanithSchroter You didn't include crucial pieces of your code, namely the places where the `dataloaders` and `dataset_sizes` dicts are computed. BTW, why are you comparing against `labels.data`? That's [deprecated since PyTorch 0.4.0](https://stackoverflow.com/questions/51743214/is-data-still-useful-in-pytorch#51744091) – Bernhard Stadler Apr 09 '23 at 18:19
  • Hi, I got the code from here: https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html. Included the code parts that I changed only in this question. Changed the loss function to nn.BCEWithLogitsLoss() from nn.CrossEntropyLoss(). Thanks & Best Regards – Alain Michael Janith Schroter Apr 10 '23 at 08:08
  • Check the variable `image_datasets`. – Vishnudev Krishnadas Apr 10 '23 at 08:09

1 Answers1

3

I didn't understand why you are using torch.max as you have one output. Anyway, you should use squeeze before comparing, so this line:

running_corrects += torch.sum(preds == labels.data)

should become

 running_corrects += torch.sum(preds == labels.squeeze())

to see why:

labels = torch.tensor([[0],
        [0],
        [0],
        [1]])

preds = torch.tensor([0, 0, 0, 0])
print(torch.sum(preds == labels)) # output 12
print(torch.sum(preds == labels.squeeze())) # output 3
TanjiroLL
  • 1,354
  • 1
  • 5
  • 5
  • I agree with @coder00. As you are using binary classification, you can use the sigmoid, and a threshold to distinguish between positive or negative labels. – DueSouth May 24 '23 at 08:52
  • Also note that if you're using sigmoid, with the current shape of your code, you will not need the `squeeze()` call on the labels if I'm not mistaken – DueSouth May 24 '23 at 09:26