I am working on getting a textsum implementation working and recently got my own scraped data fed in. I started training last night against 38000 articles. This morning when I looked at the average loss, I was around 5.2000000. When I was playing with the textsum toy set, i was able to quickly get down to around 0.0000054 for example, however this was only against like 20 articles.
I was hoping that someone that has had a bit more experience, might be able to provide me some expectations into about how long training will take. I am currently running this on an Nvidia 980M. Last week I did want to try out AWS g2.2xlarge instance but I found that ironically it seemed that my local machine was processing things faster than the Grid 520's. I still want to test out the P2 instances and also Google Cloud, but for now I think I am just going to work with my local machine.
Any info anyone might be able to provide here, regarding what I should expect? Thanks!