0

I have been training the textsum seq2seq w/attention model for abstractive summarization on a training corpus of 600k articles + abstracts. Can this be regarded convergence? If so, can it be right that it converged after less than say 5k steps? Considerations:

  • I've trained on a vocab size of 200k
  • 5k steps (until approx convergence) with a batch size of 4 means that at most 20k different samples were seen. This is only a fraction of the entire training corpus.

Or am I actually not reading my dog's face in the tea leaves and is the marginal negative slope as expected?

Loss over steps

Community
  • 1
  • 1
anthnyprschka
  • 301
  • 3
  • 14
  • 1
    model is clearly still learning. smooth it out more to see it, but clearly after 5k it was around 6.2, now it is around 5.8. – lejlot Aug 20 '17 at 22:23
  • You don't happen to know what a benchmark for running_avg_loss at convergence could be, do you? I used the same hyperparameters as the textsum authors, yet my outputs are useless so far. No i am evaluating whether this has something to do with me using a different dataset (not Gigaword, but NYT), whether some bugs were introduced to the model, or whether i am just too impatient and should let the model train *a lot* longer (or get a GPU since am training this on CPU atm which seems horribly slow though).. – anthnyprschka Aug 29 '17 at 14:41

1 Answers1

0

Ok so I actually switched to training on a GPU (instead of a CPU) and proved that the model was still learning. Here is the learning curve after initializing a completely new model: enter image description here

Speedup was roughly 30x training with AWS p2.xlarge NVIDIA K80.

anthnyprschka
  • 301
  • 3
  • 14