I am performing brain surgery on an already-existing TF single-GPU solution to make it work for multiple GPUs.
I've already asked Simple TensorFlow example loading one copy of a model onto each GPU that's available (thanks Yaroslav!) but I'm having trouble adapting the current code to get the reconciliations of the propagations working across the replicated models.
Looking at Distributed tensorflow: the difference between In-graph replication and Between-graph replication I see @mmry 's answer:
"in-graph replication" is the first approach that we tried in TensorFlow, and it did not achieve the performance that many users required, so the more complicated "between-graph" approach is the current recommended way to perform distributed training. Higher-level libraries such as tf.learn use the "between-graph" approach for distributed training.
Where can I find a simple example of tf.learn using copies of a single model for each GPU in a multiple GPU configuration?
I see that this is also a github issue: https://github.com/tflearn/tflearn/issues/696