0

Hello I am implementing a lstm for language modelling for homework and I am at the loss implementation phase. Our instructor told us to use F.nll_loss but the sequences are padded and we have to take into account a mask that is given which tells us when the sequences stop.

input:

  • log_probas (batch_size, sequence_length(padded), vocabulary size)
  • targets (batch_size, sequence_length(padded))
  • mask (batch_size, sequence_length(padded)

naive implementation which works without taking into account the mask:

import torch.nn.functional as F
loss = F.nll_loss(log_probas.transpose(1, 2), targets)

I've been crawling the internet and banging my head but can't seem to find an answer on how to incorporate a mask into the averaging scheme of the loss.

1 Answers1

2

you could reshape the tensors and use mask to select non-padded tokens, and compute the loss

vocab_size = log_probas.size(-1)
log_probas = log_probas.view(-1, vocab_size)
target = target.view(-1)
mask = mask.view(-1).bool()
loss = F.nll_loss(log_probas[mask], targets[mask])
emily
  • 178
  • 4