5

Recently, I try to learn how to use Tensorflow on multiple GPU to accelerate training speed. I found an official tutorial about training classification model based on Cifar10 dataset. However, I found that this tutorial reads image by using the queue. Out of curiosity, how can I use multiple GPU by feeding value into Session? It seems that it is hard for me to solve the problem that feeds different value from the same dataset to different GPU. Thank you, everybody! The following code is about part of the official tutorial.

images, labels = cifar10.distorted_inputs()
batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue(
      [images, labels], capacity=2 * FLAGS.num_gpus)
# Calculate the gradients for each model tower.
tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
  for i in xrange(FLAGS.num_gpus):
    with tf.device('/gpu:%d' % i):
      with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
        # Dequeues one batch for the GPU
        image_batch, label_batch = batch_queue.dequeue()
        # Calculate the loss for one tower of the CIFAR model. This function
        # constructs the entire CIFAR model but shares the variables across
        # all towers.
        loss = tower_loss(scope, image_batch, label_batch)

        # Reuse variables for the next tower.
        tf.get_variable_scope().reuse_variables()

        # Retain the summaries from the final tower.
        summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)

        # Calculate the gradients for the batch of data on this CIFAR tower.
        grads = opt.compute_gradients(loss)

        # Keep track of the gradients across all towers.
        tower_grads.append(grads)
Sean
  • 901
  • 2
  • 11
  • 30

2 Answers2

1

The core idea of the multi-GPU example is that you explicitly assign operations to a tf.device. The example loops over FLAGS.num_gpus devices and creates a replica for each of the GPUs.

If you create placeholder ops inside the for loop, they will get assigned to their respective devices. All you need to do is keep handles to the created placeholders and then feed them all independently in a single session.run call.

placeholders = []
for i in range(FLAGS.num_gpus):
    with tf.device('/gpu:%d' % i):
        plc = tf.placeholder(tf.int32) 
        placeholders.append(plc)

with tf.Session() as sess:
    fd = {plc: i for i, plc in enumerate(placeholders)}
    sess.run(sum(placeholders), feed_dict=fd)  # this should give you the sum of all
                                               # numbers from 0 to FLAGS.num_gpus - 1

To address your specific example, it should suffice to replace the batch_queue.dequeue() call with the construction of two placeholders (for image_batch and label_batch tensors), store these placeholders somewhere, and then feed the values you need to those.

Another (somewhat hacky) way is to override the image_batch and label_batch tensors directly in the session.run call, because you can feed_dict any tensor (not just a placeholder). You will still need to store the tensors somewhere to be able to reference them from the run call.

Jindra Helcl
  • 3,457
  • 1
  • 20
  • 26
  • Thank you for your patient explanation first. But I am still confused about the code after the start of the session. What is the meaning of sum of placeholders? – Sean Dec 23 '18 at 05:23
  • It's just an example of how you can reference the placeholders. In your case, you would use `session.run` to fetch a different value (e.g. a training operation), but providing the feed dict the way described above. – Jindra Helcl Dec 23 '18 at 09:04
0

QueueRunner and Queue-based API is relatively out-dated, it is clearly mentioned in Tensorflow docs:

Input pipelines using the queue-based APIs can be cleanly replaced by the tf.data API

As a result, it is recommended to use tf.data API. It optimized for multi GPU and TPU purposes.

How to use it?

dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))
iterator = dataset.make_one_shot_iterator()
x,y = iterator.get_next()
# define your model
logit = tf.layers.dense(x,2) # use x directrly in your model
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
train_step = tf.train.AdamOptimizer().minimize(cost)
with tf.Session() as sess:
  sess.run(train_step) 

You can create multiple iterator for each GPU with Dataset.shard() or more easily use estimator API.

For a complete tutorial see here.

Amir
  • 16,067
  • 10
  • 80
  • 119
  • Thank you for your detailed explanation. I think I get your points. I still have one more question. It seems that sometimes the feed value into our neural network is not limited to the training and testing data. For instance, in the Generative Adversarial Network framework, we also need to feed different Z which is Gaussian noise for multiple GPU. Can I also use tf.data API to do that or I should writer iterator by myself? – Sean Dec 23 '18 at 05:27
  • Yes you can. tf.data.Dataset.from_tensor_slices(tf.random_uniform([total_training_samples, seq_length, z_dim], minval=0, maxval=1, dtype=tf.float32)) – Amir Dec 23 '18 at 05:49
  • Thank you so much! By the way, do I need to set several input iterators if I train my model based on multiple GPU? – Sean Dec 23 '18 at 06:22
  • Take a look at this answer by a googler @mrry: https://stackoverflow.com/questions/46965098/how-does-one-move-data-to-multiple-gpu-towers-using-tensorflows-dataset-api – Amir Dec 23 '18 at 06:49
  • it is my pleasure. – Amir Dec 24 '18 at 14:19