How to interpret this loss curve of textsum model?

Question

I have been training the textsum seq2seq w/attention model for abstractive summarization on a training corpus of 600k articles + abstracts. Can this be regarded convergence? If so, can it be right that it converged after less than say 5k steps? Considerations:

I've trained on a vocab size of 200k
5k steps (until approx convergence) with a batch size of 4 means that at most 20k different samples were seen. This is only a fraction of the entire training corpus.

Or am I actually not reading my dog's face in the tea leaves and is the marginal negative slope as expected?

model is clearly still learning. smooth it out more to see it, but clearly after 5k it was around 6.2, now it is around 5.8. — lejlot, Aug 20 '17 at 22:23
You don't happen to know what a benchmark for running_avg_loss at convergence could be, do you? I used the same hyperparameters as the textsum authors, yet my outputs are useless so far. No i am evaluating whether this has something to do with me using a different dataset (not Gigaword, but NYT), whether some bugs were introduced to the model, or whether i am just too impatient and should let the model train *a lot* longer (or get a GPU since am training this on CPU atm which seems horribly slow though).. — anthnyprschka, Aug 29 '17 at 14:41

score 0 · Answer 1 · answered Sep 11 '17 at 09:49

0

Ok so I actually switched to training on a GPU (instead of a CPU) and proved that the model was still learning. Here is the learning curve after initializing a completely new model:

Speedup was roughly 30x training with AWS p2.xlarge NVIDIA K80.

answered Sep 11 '17 at 09:49

anthnyprschka

301
3
14

How to interpret this loss curve of textsum model?

1 Answers1