1

I'm looking at porting from a different production machine learning framework to TensorFlow. In our current system for both training and inference we load copies of our model onto as many GPUs as are on the machine.

I would like to keep this way of load-balancing for now. Where can I find a simple example of loading one copy of a TF model onto each GPU that's available on a machine?

empty
  • 5,194
  • 3
  • 32
  • 58

1 Answers1

1

Here's an example from https://github.com/rafaljozefowicz/lm/blob/master/language_model.py#L21

You wrap your model creation code into _forward function, and then call it once for each GPU

    for i in range(hps.num_gpus):
        with tf.device(assign_to_gpu(i, ps_device)), tf.variable_scope(tf.get_variable_scope(),
                                                                       reuse=True if i > 0 else None):
            loss = self._forward(i, xs[i], ys[i], ws[i])
            losses += [loss]
            if mode == "train":
                cur_grads = self._backward(loss, summaries=(i == hps.num_gpus - 1))
                tower_grads += [cur_grads]

    self.loss = tf.add_n(losses) / len(losses)
Yaroslav Bulatov
  • 57,332
  • 22
  • 139
  • 197
  • I guess I don't understand this example. It looks like it forward propagates on one GPU then on another GPU. Serially. I want to run the models on the GPUs in parallel. – empty Oct 16 '17 at 23:12
  • The example forward propagates 8 models on 8 gpus in parallel, then adds up their losses – Yaroslav Bulatov Oct 17 '17 at 00:23
  • I'm not seeing any multithreading here. Where is the parallelism manifested? – empty Oct 17 '17 at 16:30
  • 1
    Parallelism is handled by TensorFlow backend. When you issue `sess.run` it will start dataflow that executes all ready ops in parallel. You could also do this by using 8 session.run calls issued in parallel from 8 Python threads, but that's less efficient – Yaroslav Bulatov Oct 17 '17 at 16:34
  • This would seem to more closely fit my case. https://github.com/tensorflow/models/blob/91c7b91f834a5a857e8168b96d6db3b93d7b9c2a/tutorials/image/cifar10/cifar10_multi_gpu_train.py but https://github.com/tflearn/tflearn/issues/696 says that this is also difficult. – empty Oct 18 '17 at 18:11
  • 1
    the documentation hasn't kept up for higher level frameworks (I have no idea how to do it in tflearn). I stick to low level like in language_model.py since it works – Yaroslav Bulatov Oct 18 '17 at 18:30