How to use multiple GPUs for multiple models that work together?

Question

I have three models defined under different device scopes in tensorflow and I'm using GradientTape to train these networks. When I do this the memory increases by a few hundred megabytes to show that the model has loaded in the respective GPUs. The problem is that when I start to train, even with a very small batch size, only the GPU @ position 0 memory increases. I'm using GradientTape to do the training process as well. Is there any way to ensure that only the GPUs assigned to the models are used for that model?

with tf.device('/device:GPU:0'):
    model1 = model1Class().model()

with tf.device('/device:GPU:1'):
    model2 = model2Class().model()

with tf.device('/device:GPU:2'):
    model3 = model3Class().model()


for epoch in range(10):
    dataGen = DataGenerator(...)
    X, y = next(dataGen)

    with tf.GradientTape() as tape1:
         X = model1(X)
         loss1 = lossFunc(X, y[1])
    grads1 = suppressionTape.gradient(tape1,model1.trainable_weights)
    optimizer1.apply_gradients(zip(model1.trainable_weights))

    with tf.GradientTape() as tape2:
         X = model2(X)          # Uses output from model2
         loss2 = lossFunc(X, y[2])
    grads2 = suppressionTape.gradient(tape2,model2.trainable_weights)
    optimizer2.apply_gradients(zip(model2.trainable_weights))

    with tf.GradientTape() as tape3:
         X = model3(X)          # Uses output from model3
         loss3 = lossFunc(X, y[3])
    grads3 = suppressionTape.gradient(tape3,model3.trainable_weights)
    optimizer3.apply_gradients(zip(model3.trainable_weights))

score 0 · Answer 1 · edited Nov 16 '21 at 14:31

I must admit that I have been searching a bit to give you a correct solution to your problem. It seems that the answer to your question resides in here (the credits go to Laplace Ricky):

@Laplace Ricky: It is supposed to run in single gpu (probably the first gpu, GPU:0) for any codes that are outside of mirrored_strategy.run(). Also, as you want to have the gradients returned from replicas, mirrored_strategy.gather() is needed as well.

Besides these, a distributed dataset must be created by using mirrored_strategy.experimental_distribute_dataset. Distributed dataset tries to distribute single batch of data across replicas evenly. An example about these points is included below.

model.fit(), model.predict(),and etc... run in distributed manner automatically just because they've already handled everything mentioned above for you.

See this thread here: Tensorflow - Multi-GPU doesn’t work for model(inputs) nor when computing the gradients.

You need to use mirrored_strategy.experimental_distribute_dataset(dataset) and adapt the code to your needs.

How to use multiple GPUs for multiple models that work together?

1 Answers1