I am using Tensorflow 0.9 and training with the Textsum model. I have about 1.3 million articles that I scraped and have been training against them for about a week now. The average loss was about 1.75 - 2.1. I decided to stop and run eval as it is my understanding that my avg loss should be close to what I get with training. When I ran the eval I am seeing 2.6 to 2.9 average loss. I was just wondering what I should expect to see when performing this run.
Am I using this training/eval analysis correctly? I am somewhat new to deep learning and trying to use this as a way to learn and through some other reading, it seems that this may be a bit of a large spread between the two.
Is there a standard tolerance for evaluating against a different dataset and what the difference of average loss should be? At this point I'm not sure if I should keep training or stop here for now and try to figure out how to get this running in tensorflow serving. I don't want to over-fit the model, but from an academic standpoint, let's say I did over-fit via training. What would I need to do to "fix" it? Do you simply get more articles and feed in that data now as training or is the model essentially broke and unusable?