0

I built my Transformer model for recovering text. In detail, the source text may contain some redundant, missing or wrong words, my model have to correct as many as possible these words. Moreover, I just want my model learn embedding of the correct sentence, so the sources and the targets are sequences of embedding. Therefore, my loss function - Cross Entropy takes 2 embedding sequence as input and target. In addition, this model is a part of the larger model which the main criterion is Negative-Log Likelihood.

Unfortunately, the values of Cross Entropy Loss is under 0.0 after few epochs, then the sum of Cross Entropy and Negative-Log Likelihood is under 0.0 too. This makes the whole model be not able to converge.

I need helps to resolve this issue. Thank in advance.

0 Answers0