textsum does not converge

Question

I have trained text sum for 5 days with the parameters recommended in the project page. I use a training set with more than 3 million article-summary pairs.

At first running_average_loss decrease slowly from around 9 to around 4, but after that, running_average_loss value changes in a wide range, it can be as high as more than 5, but sometimes can be as low as 1. And I test the model with some article in the training set, but the output is far from the referenced summary, I'm confused. Can someone share their experience?

I'm confused with following questions

running_average_loss is less then 10 every time I run , is it normal?
Is it over fitting since running_average_loss varies in a wide range and has no sign to converge?
How long will it take to train a model good enough or when to stop training? Is there a sign to indicate to stop training?

score 0 · Answer 1 · answered Nov 07 '17 at 12:07

I don't think you did enough training, because from the graph its saying 50K steps and even with batch size of 64 network maximum seen 50k * 64 samples. That is much smaller that 3 Million samples you have. The network not even seen all samples once. So you need multiple iterations through the same samples again for converge better.

Loss 1 will be a reasonably good loss i believe, if you are considering average loss. I think your network is running with sampled softmax loss. I am interested to know where you got 3 Million samples.

textsum does not converge

1 Answers1