I have my workload partitioned on two GPUs (aka, model partitioning). By default, TF/Keras allocates all the gradients on GPU0 but I want to use the colocate_gradients_with_ops
to spread the allocation across two GPU.
I'm looking for a simple way to do that in Keras. My way was to create a new optimizer subclassed from tf.train.AdamOptimizer
just to flip the default value of colocate_gradients_with_ops
(from False
to True
) . Also I have to flip it in two methods!
I'm looking for a shorter, more direct way than the one below in Keras.
class MyAdamOptimizer(tf.train.AdamOptimizer):
def compute_gradients(self,
loss,
var_list=None,
gate_gradients=tf.train.Optimizer.GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=True,
grad_loss=None):
return super(MyAdamOptimizer, self).compute_gradients(
loss,
var_list=None,
gate_gradients=tf.train.Optimizer.GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=True,
grad_loss=None)
def minimize(
loss,
global_step=None,
var_list=None,
gate_gradients=tf.train.Optimizer.GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=True,
name=None,
grad_loss=None):
return super(MyAdamOptimizer, self).minimize(
loss,
global_step=None,
var_list=None,
gate_gradients=tf.train.Optimizer.GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=True,
name=None,
grad_loss=None)
Then I call
model.compile(optimizer=MyAdamOptimizer(learning_rate=0.001),
loss='categorical_crossentropy',
metrics=['accuracy'])