Does tensorflow convnet only duplicate model across multiple GPUs?

Question

I am currently running a Tensorflow convnet for image recognition and I am considering of buying new GPUs to enable more complex graphs, batch size, and input dimensions. I have read posts like this that do not recommend using AWS GPU instances to train convnets, but more opinions are always welcomed.

I've read Tensorflow's guide 'Training a Model Using Multiple GPU Cards', and it seems that the graph is duplicated across the GPUs. I would like to know is this the only way to use parallel GPUs in Tensorflow convnet?

The reason I am asking this is because if Tensorflow can only duplicate graphs across multiple GPUs, it would mean each GPU must have at least the memory size that my model requires for one batch. (Example if the minimum memory size required is 5GB, two card of 4GB each would not do the job)

Thank you in advance!

TensorFlow lets you can take a single graph, and split it up over multiple GPUs in an arbitrary fashion, using `with tf.device` annotations — Yaroslav Bulatov, Jun 22 '17 at 15:01

score 0 · Accepted Answer · answered Jul 28 '17 at 23:45

No, it is definitely possible to use different variables on different GPUs. For every variable and every layer that you declare, you have the choice of where do you declare the variable.

And in the specific case, you would want to use multiple GPUs for duplicating your model only to increase its batch_size training parameter to train faster, you would still need to explicitly build your model using the concept of shared parameters and manage how do those parameters communicate.

Does tensorflow convnet only duplicate model across multiple GPUs?

1 Answers1