Here is my loss and accuracy after each epoch. It's a sequence-to-sequence model with 4 input and output tokens. enter image description here
If I were to implement early stopping, where would I stop training? Dev loss and accuracy haven't yet started to decline, what would you do?
Thanks.