Getting nan loss in Adam Model using PyTorch

Question

I am new to training neural nets. Please forgive me if this is a very stupid question or violates any of the unsaid rules of stack overflow. I started working on the titanic data set recently. I cleaned the data. I have a features tensor that I made by concatenating the normalized continuous data and the one hot tensors of the categorical data. I am passing this data into a simple linear model and I am getting nan loss for all epochs.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from tqdm import tqdm
import pickle
import pathlib

path = pathlib.Path('./drive/My Drive/Kaggle/Titanic')

with open(path/'feature_tensor.pickle', 'rb') as f:
    features = pickle.load(f)

with open(path/'label_tensor.pickle', 'rb') as f:
    labels = pickle.load(f)

features = features.float()
labels = labels.float()

import math
valid_size = -1 * math.floor(0.2*len(features))

train_features = features[:valid_size]
valid_features = features[valid_size:]

train_labels = labels[:valid_size]
valid_labels = labels[valid_size:]

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.h_l1 = nn.Linear(18, 64)
        self.h_l2 = nn.Linear(64, 32)
        self.o_l = nn.Linear(32, 2)

    def forward(self, x):
        x = F.relu(self.h_l1(x))
        x = F.relu(self.h_l2(x))
        return self.o_l(x)

model = Model()
model.to('cuda')

optimizer = optim.Adam(model.parameters())
loss_fn = nn.MSELoss()

EPOCHS = 5
BATCH_SIZE = 20

for EPOCH in range(0, EPOCHS):
    for i in tqdm(range(0, len(features), BATCH_SIZE)):
        train_feature_batch = train_features[i:i+BATCH_SIZE,:].to('cuda')
        train_label_batch = train_labels[i:i+BATCH_SIZE,:].to('cuda')
        valid_feature_batch = valid_features[i:i+BATCH_SIZE,:].to('cuda')
        valid_label_batch = valid_labels[i:i+BATCH_SIZE,:].to('cuda')
        train_loss = loss_fn(model(train_feature_batch), train_label_batch)
        with torch.no_grad():
            valid_loss = loss_fn(model(valid_feature_batch), valid_label_batch)
        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()
    print(f"Epoch : {EPOCH}\tTrain Loss : {train_loss}\tValid_loss : {valid_loss}\n")

I am getting the following output:

100%|██████████| 45/45 [00:00<00:00, 511.50it/s]
100%|██████████| 45/45 [00:00<00:00, 604.10it/s]
100%|██████████| 45/45 [00:00<00:00, 586.21it/s]
  0%|          | 0/45 [00:00<?, ?it/s]Epoch : 0 Train Loss : nan    Valid_loss : nan

Epoch : 1   Train Loss : nan    Valid_loss : nan

Epoch : 2   Train Loss : nan    Valid_loss : nan

100%|██████████| 45/45 [00:00<00:00, 555.55it/s]
100%|██████████| 45/45 [00:00<00:00, 607.65it/s]Epoch : 3   Train Loss : nan    Valid_loss : nan

Epoch : 4   Train Loss : nan    Valid_loss : nan

Yes the output is scattered like this. Please help.

does the predictions and labels have the same shape ? also consider using pytorch dataloader or dataset api. if your code looks like this with pytorch then there is something wrong because pytorch was made to make the code looks better than this. I'm not criticizing but I think it can be a lot better using some pytorch functionality like dataset and Dataloader — basilisk, Dec 11 '19 at 17:05
Sorry I am just a beginner here. I will definitely try out dataloader and dataset. — Koushik Sahu, Dec 11 '19 at 19:05
no problem just take a look here https://pytorch.org/docs/stable/data.html — basilisk, Dec 11 '19 at 19:12
I read a book "Deep learning with PyTorch" and learnt everything from there — Koushik Sahu, Dec 11 '19 at 19:15
I tried to put a batch of 20 features into the model and find the output and loss. The values were fine. When I am doing the same over whole dataset and in epochs, I am getting this nan error. — Koushik Sahu, Dec 11 '19 at 19:17
ok then I would say start with a small neural net with 1 hidden layer and a small number of neurons in that layer maybe just as the number of features or 2* number of features and see what output you ll got — basilisk, Dec 11 '19 at 19:21
Yeah okay I'll try that, maybe that would work...making a dataset and a dataloader seems to be tough, to do so we have to subclass Dataset and overwrite functions and then use it in Dataloader...seems a bit overwhelming...can you direct me to a source where it is explained in an easy manner? — Koushik Sahu, Dec 11 '19 at 19:24
there are examples in the tutorials in the official website. it's pretty straightforward to use it. it is not a must to use dataset, you can only use the Dataloader. you can concatenate your feature and labels by column so that every row will have features and the label corresponding to it. you can do that by calling numpy or pandas concatenation function and pass axis=1. after that you can easily call that concatenated dataset from the dataloader object, it is pretty easy to use it — basilisk, Dec 11 '19 at 19:36
I used linear(18,32) as hidden and linear(32,2) as output. It showed nan again. Any thoughts? — Koushik Sahu, Dec 11 '19 at 19:46

Getting nan loss in Adam Model using PyTorch

0 Answers0