MSELoss when mask is used

Question

I'm trying to calculate MSELoss when mask is used. Suppose that I have tensor with batch_size of 2: [2, 33, 1] as my target, and another input tensor with the same shape. Since sequence length might differ for each instance, I have also a binary mask indicating the existence of each element in the input sequence. So here is what I'm doing:

mse_loss = nn.MSELoss(reduction='none')

loss = mse_loss(input, target)
loss = (loss * mask.float()).sum() # gives \sigma_euclidean over unmasked elements

mse_loss_val = loss / loss.numel()

# now doing backpropagation
mse_loss_val.backward()

Is loss / loss.numel() a good practice? I'm skeptical, as I have to use reduction='none' and when calculating final loss value, I think I should calculate the loss only considering those loss elements that are nonzero (i.e., unmasked), however, I'm taking the average over all tensor elements with torch.numel(). I'm actually trying to take 1/n factor of MSELoss into account. Any thoughts?

score 6 · Accepted Answer · answered May 03 '20 at 20:22

6

There are some issues in the code. I think correct code should be:

mse_loss = nn.MSELoss(reduction='none')

loss = mse_loss(input, target)
loss = (loss * mask.float()).sum() # gives \sigma_euclidean over unmasked elements

non_zero_elements = mask.sum()
mse_loss_val = loss / non_zero_elements

# now doing backpropagation
mse_loss_val.backward()

This is only slightly worse than using .mean() if you are worried about numerical errors.

answered May 03 '20 at 20:22

Umang Gupta

15,022
6
48
66

2

Thanks! I was thinking of this solution before as well, because it doesn't make sense to me to take the average over all elements (zero and non-zero). – inverted_index May 03 '20 at 21:00

MSELoss when mask is used

1 Answers1