0

I have my workload partitioned on two GPUs (aka, model partitioning). By default, TF/Keras allocates all the gradients on GPU0 but I want to use the colocate_gradients_with_ops to spread the allocation across two GPU.

I'm looking for a simple way to do that in Keras. My way was to create a new optimizer subclassed from tf.train.AdamOptimizer just to flip the default value of colocate_gradients_with_ops (from False to True) . Also I have to flip it in two methods!

I'm looking for a shorter, more direct way than the one below in Keras.

class MyAdamOptimizer(tf.train.AdamOptimizer):
    def compute_gradients(self,
                          loss,
                          var_list=None,
                          gate_gradients=tf.train.Optimizer.GATE_OP,
                          aggregation_method=None,
                          colocate_gradients_with_ops=True,
                          grad_loss=None):
        return super(MyAdamOptimizer, self).compute_gradients(
            loss,
            var_list=None,
            gate_gradients=tf.train.Optimizer.GATE_OP,
            aggregation_method=None,
            colocate_gradients_with_ops=True,
            grad_loss=None)

    def minimize(
            loss,
            global_step=None,
            var_list=None,
            gate_gradients=tf.train.Optimizer.GATE_OP,
            aggregation_method=None,
            colocate_gradients_with_ops=True,
            name=None,
            grad_loss=None):
        return super(MyAdamOptimizer, self).minimize(
            loss,
            global_step=None,
            var_list=None,
            gate_gradients=tf.train.Optimizer.GATE_OP,
            aggregation_method=None,
            colocate_gradients_with_ops=True,
            name=None,
            grad_loss=None)

Then I call

model.compile(optimizer=MyAdamOptimizer(learning_rate=0.001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])
MBT
  • 21,733
  • 19
  • 84
  • 102
auro
  • 1,079
  • 1
  • 10
  • 22

1 Answers1

0

There is no simpler way. Keras AdamOptimizer uses its own implementation from basic operators. You have to use a custom optimizer for colocate_gradients_with_ops. If the purpose is to improve multi-GPU performance, you can try Keras-MXNet's AdamOptimizer, we overloaded Keras' Optimizer class and have better efficiency on multi-GPUs. You don't have to change your training code.

roywei
  • 11
  • 1