In the following code (extracted from SentEval), a neural network structure is defined which maps 1024 real numbers to 5 output predictions. The problem is to assess the relatedness between two sentences (each represented with 512 features). The relatedness is a number in [1,5]. I think if the training relatedness numbers were in {1,2,3,4,5}, the cross entropy
was a better loss function, but since in the training set we have real relatedness numbers in [1,5], the MSE
is used as the loss function.
Question: Since for each input, the network outputs 5 probability numbers, how the MSE
is calculated between a real number and 5 probability numbers?
from torch import nn
inputdim = 1024
nclasses = 5
model = nn.Sequential(
nn.Linear(inputdim, nclasses),
nn.Softmax(dim=-1),
)
loss_fn = nn.MSELoss()