2

I am working on tensorflow's textsum (text summarization model). I have put it on run to train the model with the sample data i.e. toy dataset provided with the model while cloning from git. I wanted to know that how much time it will take to train the model and decode with the sample dataset? It has already taken more than 17hrs and still running.

Kajal Kodrani
  • 21
  • 1
  • 3
  • As @Eilian has stated below, if you are running this on a CPU, you might be there a while. If you don't have access to a GPU, you might want to look at getting on an AWS G2 or P2 instance: https://aws.amazon.com/ec2/instance-types/ When I ran training against the toy dataset, I got decent results with a very low average loss after about a day of training on my 980M. The important thing to note though, is that you will not get valid results if you use the included toy dataset vocab as the words in the training set are not in the toy vocab file. https://github.com/tensorflow/models/issues/464 – xtr33me Oct 14 '16 at 15:33
  • I am running training model on GPU on;y. But I have changed the max_run_steps to '2000'. It ran for 2-3 hrs and model gets trained. – Kajal Kodrani Oct 17 '16 at 05:24
  • here, I have splitted toy dataset into 17-4 (training-testing). and trained the model with the same vocab file. but I am facing some issue with decode step. Do I need to modify vocab file when I am changing the training data ? How can I generate the vocab file from the training dataset? – Kajal Kodrani Oct 17 '16 at 05:27
  • Just to answer your question on the vocab file. All that the vocab file is representing are the individual words in the data trained against and their total counts that they occur. So if out of all the data files the word 'the' appeared 150 times, then you would see 'the 150' in the vocab file. So when I created it, as part of my formatting of the raw data, I kept tallies of the counts and at the end output the data to the vocab file. – xtr33me Oct 17 '16 at 14:59

2 Answers2

1

Unfortunately with the toy data training set, it is only meant to provide you a means to watch the overall flow of the model and not meant to provide you decent results. This is because there is just not enough data provided int he toy dataset to provide good results.

Amount of time is kind of difficult to provide as it is all relative to the hardware you are running on. So you are normally going to train until you get to about an average loss between 2 and 1. Xin Pan stated with larger datasets you should never go below 1.0 avg loss. So on my 980M I was able to get this in less than a day with the toy dataset.

That said, my results were really bad and I thought there was something wrong. I found that the only thing wrong was I didn't have enough data. I then scraped about 40k articles and still, the results were not acceptable. Recently I have trained against 1.3 million articles and the results are so much better. After further analysis, it is primarily due to the textsum model being abstractive rather than extractive.

Hope this somewhat helps. For the 1.3 million and batch set to 64, I was able to train the model on my hardware in less than a week and a half using TF 0.9, cuda 7.5 and cudnn 4. I hear the new cudnn/cuda are supposed to be faster, but I can't speak to that as of yet.

xtr33me
  • 936
  • 1
  • 13
  • 39
0

On my i5 processor, using only cpu, it took about 60 hours to reach to a value of 0.17 for the toy training dataset.

Using 8gb of ram it consumed an extra memory of about 10gb of additional swap. Increased ram and use of GPU might have provided better results. Presently I am unable to show an image of running average loss from tensorboard, but I hope your query has been answered.

Ayushya
  • 9,599
  • 6
  • 41
  • 57