0

in train_epoch function

we have three kinds of losses

  1. loss
  2. batch_loss
  3. train_loss

as I understand loss is a tensor, batch loss is the value of the tensor , train_loss is the accumulative value of the batch_loss this is ok for me.

my question is why AllenNLP considered the batch_loss in for batch and did not calculate the cumulative loss for batch_group?

Also I did not understand the need for batch_group inside epoch, and batch inside batch_group

this is my understanding we have epoch inside it we have batch_group inside batch_group we have batch the batch_loss is calculated for batch not for batch_group why?

Arij Aladel
  • 356
  • 1
  • 3
  • 10
  • in this line https://github.com/allenai/allennlp/blob/0ad228d4cf7dee4bb782026de272866819d44654/allennlp/training/trainer.py#L725 I think there is a bug shouldn't we first accumulate the batch loss then after fininishing the for loop add the batch_reg_loss to total train_reg loss as was done with batch_loss? – Arij Aladel Oct 28 '20 at 06:05

1 Answers1

1

my question is why AllenNLP considered the batch_loss in for batch and did not calculate the cumulative loss for batch_group?

This is actually a bug, so thanks for pointing that out! There is a PR open now to fix it: https://github.com/allenai/allennlp/pull/4706

Also I did not understand the need for batch_group inside epoch, and batch inside batch_group

batch_group always consists of just a single batch unless you're using num_gradient_accumulation_steps greater than 1, i.e. you're using gradient accumulation, which is a method for getting a larger effective batch size.

See https://medium.com/ai2-blog/tutorial-training-on-larger-batches-with-less-memory-in-allennlp-1cd2047d92ad, for example.

petew
  • 671
  • 8
  • 13
  • Now I have another question related to backward. On which loss of these three kind of losses is better to do backward and why? Is it ok to use batch_loss.backward() – Arij Aladel Oct 09 '20 at 10:51
  • and please why we are deviding on len(batch_group)? – Arij Aladel Oct 09 '20 at 12:13
  • `batch_loss` just tracks the loss of a batch group, but it's not a tensor, it's only a float, so we can't call `.backwards()` on it. Instead, we call `.backwards()` on the `loss` tensor of every batch, accumulating the gradients in the batch group until we have exhausted the batch group, at which point we can then call `step()` on our optimizer. We divide `loss` by `len(batch_group)` because we want to average the `loss` over the batch group. – petew Oct 09 '20 at 16:10
  • yes understand that batch_loss is is a float we can have it as a tensor any way and calculate the backward that is why I asked – Arij Aladel Oct 09 '20 at 18:22