I have three models defined under different device scopes in tensorflow and I'm using GradientTape to train these networks. When I do this the memory increases by a few hundred megabytes to show that the model has loaded in the respective GPUs. The problem is that when I start to train, even with a very small batch size, only the GPU @ position 0 memory increases. I'm using GradientTape to do the training process as well. Is there any way to ensure that only the GPUs assigned to the models are used for that model?
with tf.device('/device:GPU:0'):
model1 = model1Class().model()
with tf.device('/device:GPU:1'):
model2 = model2Class().model()
with tf.device('/device:GPU:2'):
model3 = model3Class().model()
for epoch in range(10):
dataGen = DataGenerator(...)
X, y = next(dataGen)
with tf.GradientTape() as tape1:
X = model1(X)
loss1 = lossFunc(X, y[1])
grads1 = suppressionTape.gradient(tape1,model1.trainable_weights)
optimizer1.apply_gradients(zip(model1.trainable_weights))
with tf.GradientTape() as tape2:
X = model2(X) # Uses output from model2
loss2 = lossFunc(X, y[2])
grads2 = suppressionTape.gradient(tape2,model2.trainable_weights)
optimizer2.apply_gradients(zip(model2.trainable_weights))
with tf.GradientTape() as tape3:
X = model3(X) # Uses output from model3
loss3 = lossFunc(X, y[3])
grads3 = suppressionTape.gradient(tape3,model3.trainable_weights)
optimizer3.apply_gradients(zip(model3.trainable_weights))