In tensorflow's cifar10 multi-GPU example, it seems (correct me if I am wrong) that one queue of training images is created per GPU. Wouldn't the "right" way of doing things be to have a single queue feeding all of the towers? If so, is there an example available of a shared queue?
Asked
Active
Viewed 3,038 times
1 Answers
14
You're correct that the code for the CIFAR-10 model uses multiple input queues (through multiple calls to cifar10.distorted_inputs()
via cifar10.tower_loss()
).
The easiest way to use a shared queue between the GPUs would be to do the following:
Increase the batch size by a factor of N, where N is the number of GPUs.
Move the call to
cifar10.distorted_inputs()
out ofcifar10.tower_loss()
and outside the loop over GPUs.Split the
images
andlabels
tensors that are returned fromcifar10.distorted_inputs()
along the 0th (batch) dimension:images, labels = cifar10.distorted_inputs() split_images = tf.split(0, FLAGS.num_gpus, images) split_labels = tf.split(0, FLAGS.num_gpus, labels)
Modify
cifar10.tower_loss()
to takeimages
andlabels
arguments, and invoke it as follows:for i in xrange(FLAGS.num_gpus): with tf.device('/gpu:%d' % i): with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope: loss = tower_loss(scope, split_images[i], split_labels[i])

mrry
- 125,488
- 26
- 399
- 400