High loss in neural network sequence classification

Question

I am using neural network to classify sequence of length 340 to 8 classes, I am using cross entropy as loss. I am getting very high number for the loss . I am wondering if I did mistake in calculating the loss for each epoch. Or should i use other loss function .

criterion = nn.CrossEntropyLoss()
if CUDA:
    criterion = criterion.cuda()
optimizer = optim.SGD(model.parameters(), lr=LEARNING_RATE, momentum=0.9)
loss_list = []                                                                                                                      
for epoch in range(N_EPOCHES):                                                                                                          
    tot_loss=0                                                                                                                          
    running_loss =0                                                                                                                     
    model.train()                                                                                                                       
    loss_values = []                                                                                                                    
    acc_list = []                                                                                                                       
    acc_list = torch.FloatTensor(acc_list)                                                                                              
    sum_acc = 0                                                                                                                         
    # Training                                                                                                                                                                                                                                                        
    for i, (seq_batch, stat_batch) in enumerate(training_generator):                                                                    
        # Transfer to GPU                                                                                                               
        seq_batch, stat_batch = seq_batch.to(device), stat_batch.to(device)                                                             
        optimizer.zero_grad()                                                                                                           
        # Model computation                                                                                                             
        seq_batch = seq_batch.unsqueeze(-1)                                                                                                                                                                                                                          
        outputs = model(seq_batch)                                                                                                                                                                                                                                                                                                                                                            
        loss = criterion(outputs.argmax(1), stat_batch.argmax(1))                                                                                                                                                                                                                                           
        loss.backward()                                                                                                                 
        optimizer.step()                                                                                                                
        # print statistics                                                                                                              
        running_loss += loss.item()*seq_batch.size(0)                                                                                   
        loss_values.append(running_loss/len(training_set))                                                                                                                                                                                                                                                   
        if i % 2000 == 1999:  # print every 2000 mini-batches                                                                           
            print('[%d, %5d] loss: %.3f' %                                                                                              
                  (epoch + 1, i + 1, running_loss / 50000),"acc",(outputs.argmax(1) == stat_batch.argmax(1)).float().mean())            
            running_loss = 0.0                                                                                                          
        sum_acc += (outputs.argmax(1) == stat_batch.argmax(1)).float().sum()                                                            
    print("epoch" , epoch, "acc", sum_acc/len(training_generator))                                                                                                                                                                                                                                                                                                                              
print('Finished Training')

[1,  2000] loss: 14.205 acc tensor(0.5312, device='cuda:0')
[1,  4000] loss: 13.377 acc tensor(0.4922, device='cuda:0')
[1,  6000] loss: 13.159 acc tensor(0.5508, device='cuda:0')
[1,  8000] loss: 13.050 acc tensor(0.5547, device='cuda:0')
[1, 10000] loss: 12.974 acc tensor(0.4883, device='cuda:0')
epoch 1 acc tensor(133.6352, device='cuda:0')
[2,  2000] loss: 12.833 acc tensor(0.5781, device='cuda:0')
[2,  4000] loss: 12.834 acc tensor(0.5391, device='cuda:0')
[2,  6000] loss: 12.782 acc tensor(0.5195, device='cuda:0')
[2,  8000] loss: 12.774 acc tensor(0.5508, device='cuda:0')
[2, 10000] loss: 12.762 acc tensor(0.5156, device='cuda:0')
epoch 2 acc tensor(139.2496, device='cuda:0')
[3,  2000] loss: 12.636 acc tensor(0.5469, device='cuda:0')
[3,  4000] loss: 12.640 acc tensor(0.5469, device='cuda:0')
[3,  6000] loss: 12.648 acc tensor(0.5508, device='cuda:0')
[3,  8000] loss: 12.637 acc tensor(0.5586, device='cuda:0')
[3, 10000] loss: 12.620 acc tensor(0.6016, device='cuda:0')
epoch 3 acc tensor(140.6962, device='cuda:0')
[4,  2000] loss: 12.520 acc tensor(0.5547, device='cuda:0')
[4,  4000] loss: 12.541 acc tensor(0.5664, device='cuda:0')
[4,  6000] loss: 12.538 acc tensor(0.5430, device='cuda:0')
[4,  8000] loss: 12.535 acc tensor(0.5547, device='cuda:0')
[4, 10000] loss: 12.548 acc tensor(0.5820, device='cuda:0')
epoch 4 acc tensor(141.6522, device='cuda:0')

score 2 · Accepted Answer · answered Jan 31 '21 at 09:40

2

I am getting very high number for the loss

What makes you think this is high? What do you compare this to?

Yes, you should use nn.CrossEntropyLoss for multi-class classification tasks. And your training loss seems perfectly fine to me. At initialization, you should have loss = -log(1/8) = ~2.

answered Jan 31 '21 at 09:40

Ivan

34,531
8
55
100

Thank for replying @Ivan. As a beginner ,I do know really how interpreter those numbers . I thought it will be numbers between 0 and 1 . Can I ask you how i can know the accuracy of my model?. If it is really work well for my data set. Do you mean to initialize running_loss with -log(1/8)?. – No Na Jan 31 '21 at 12:17
*"Do you mean to initialize running_loss with -log(1/8)"* no, `-log(1/8)` is the value it's meant to have on init (with random weights) since you have *8* classes and are using `nn.CrossEntropyLoss`. You can get the accuracy with `(outputs.argmax(1) == stat_batch.argmax(1)).float().mean()`, basically counting the number of correct class predictions. – Ivan Jan 31 '21 at 13:06
1

@NoNa -log(1/8) indicates that the cross entropy between the prediction and target is maximal (i.e. the same as the entropy of a uniform distribution). Basically this is the number you should expect if the network prediction has nothing to do with the the target labels. – jodag Jan 31 '21 at 13:06
@ivan ''' I got [14, 340000] loss: 1.931 acc tensor(0.1250, device='cuda:0') ''', what that accuracy means . Now i am in epoch 14 and it is the same range of loss. is that mean overfit, Also , How can I know if my model is good , should i do any kind of test? – No Na Feb 01 '21 at 12:38
@NoNa That's exactly what you should get on init: if you adopt the uniform policy to predict the class you will be right `1/8*100 = 12.5`% of the time, *i.e.* the model is 12.5% accurate. Your objective is to train the model such that the accuracy is as closest to 100% as possible. The accuracy is literally the number of correct prediction over the total number of predictions. – Ivan Feb 01 '21 at 12:41
@Ivan So I keep change in the model until I maximize the loss right?. In that time I can test it ?. I also want to ask also, I am getting loss and acc for the same epoch many time as you see in My Code, How can I have only one acc and loss for evey epoch?..Thanks alot ,appricate your answers . – No Na Feb 01 '21 at 17:30
@Ivan , Also I am wondering if that line correct " loss = criterion(outputs, stat_batch.argmax(1))", since stat_batch it is long Tensor but it has only binary numbers and outputs are numbers in [0,1]. Should I round outputs .Or the cross entropy do that. I also face that problem when i try to predict for test data. – No Na Feb 01 '21 at 17:45
Did you use an activation as the last layer of your model? As for `nn.CrossEntropyLoss`, it handle everything as long as you give it raw logits: it will apply a softmax then compute the negative log-likelihood loss. Targets are class indices and outputs are logits. – Ivan Feb 01 '21 at 18:01
@Ivan Yes I apply on last layer Softmax (tried also LogSoftmax) . I mean for calculating the accuracy here (outputs.argmax(1) == stat_batch.argmax(1)).float().mean(). is that make problem in comparing since outputs are values in [0,1] and stat are binary ? – No Na Feb 02 '21 at 12:55
You shouldn't have a softmax, nor log-softmax layer if you use `nn.CrossEntropyLoss`.The point is to select the highest class index from `outputs` (*i.e.* the class with predicted with the highest probability) and from `stat_batch` (basically the component which equals to *1*). – Ivan Feb 02 '21 at 13:00
Okay I removed it , Now my last layer is full connected layer . But I am wondering I always have acc gave me those numbers and repeated them along every epoch 0.2500, 0.5. 0.3750,012... – No Na Feb 02 '21 at 15:14
What is your batch size? If it's one, then it's likely to be very volatile. – Ivan Feb 02 '21 at 17:53
batch size is 8 – No Na Feb 02 '21 at 18:38
Even so, the accuracy at *t* will only depend on *8* predictions. You could count the number of correct predictions over the whole epoch, which would lead to more stable tracking of the accuracy. – Ivan Feb 02 '21 at 18:40
@ivan , So to count the number of prediction over the whole epoch I added this line in nested for ```sum_acc += (outputs.argmax(1) == stat_batch.argmax(1)).float().mean()``` and sum_Acc is zero. Is that right or i should divided it with something ? – No Na Feb 04 '21 at 03:59
I think you have a problem with your outputs or targets, did you check them with a `print`? – Ivan Feb 04 '21 at 08:08
@ivan , Sorry i think i explained wrong , I mean I intialzed sum_acc=0 and I added this line in nested for sum_acc += (outputs.argmax(1) == stat_batch.argmax(1)).float().mean() . to count the correct pred . It give me numbers but i am asking if that right or I should divid sum_Acc by any numbers? – No Na Feb 04 '21 at 09:02
You adding averages together, that won't be accurate. You should do `sum_acc += (outputs.argmax(1) == stat_batch.argmax(1)).float().sum()` by `training_generator` size. – Ivan Feb 04 '21 at 10:29
@ivan , I did that but i am getting so weird numbers as accuracy. I updated the code in the question and the results. Can you please take a look . Could you please tell me if I did something wrong? – No Na Feb 04 '21 at 12:05

High loss in neural network sequence classification

1 Answers1