tensorflow slim concurrent train and evaluation loops; single device

Question

I am interested in using the tensorflow slim library (tf.contrib.slim) to do evaluation of a model performance on a(n) (entire) test set periodically during training. The documentation is pretty clear that slim.evaluation.evaluation_loop is the way to go, and it looks promising. The issue is that I don't have a second gpu to spare, this model parameters take up an entire gpu's worth of memory, and I would like to do concurrent evaluation.

For example, if I had 2 GPUs, I could run a python script that terminated with "slim.learning.train()" on the first gpu, and another that terminated with "slim.evaluation.evaluation_loop()" on the second gpu.

Is there an approach that can manage 1 gpu's resources for both tasks? tf.train.Supervisor comes to mind, but I don't honestly know.

score 1 · Accepted Answer · answered Mar 30 '18 at 21:14

1

You can partition the GPU usage using the following code.

You can set the fraction of the GPU to be used for training and evaluation separately. The code below means that the process is given 30% of the memory. gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3000) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) sess.run(tf.app.run())

answered Mar 30 '18 at 21:14

Shruthi Sampath Kumar

26
4

Do you envision running two programs? or one program with 2 sessions? – user3391229 Mar 31 '18 at 23:44

tensorflow slim concurrent train and evaluation loops; single device

1 Answers1