Suppose we have problem where we have 100 images and a batch size of 15. We have 15 images in all of out batches except our last batch which contains 10 images.
Suppose we have network training as:
network = Network()
optimizer = optim.Adam(network.parameters(),lr=0.001)
for epoch in range(5):
total_loss = 0
train_loader = torch.utils.data.DataLoader(train_set,batch_size=15)
for batch in train_loader:
images,labels = batch
pred = network(images)
loss = F.cross_entropy(pred,labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss+= loss.item()*15
Is'nt the last batch always supposed to give us a increased value of loss
because we will be multiplying by 15 where we were supposed to multiply by 10 in the last batch? Should't it be
total_loss+= loss.item()*len(images)
in place of 15 or batch_size
??
Can we use
for every epoch:
for every batch:
loss = F.cross_entropy(pred,labels,reduction='sum')
total_loss+=loss.item()
avg_loss_per_epoch = (total_loss/len(train_set))
can someone please explain that multiplying by batch_size
a good idea and how am I wrong?