I am trying to modify the code of mask rcnn to run it on multi-gpu, based on the sample of cifar10, the most part of code is below
One image and ground truth infomation is read from TFRecords file as below
image, ih, iw, gt_boxes, gt_masks, num_instances, img_id = \
datasets.get_dataset(FLAGS.dataset_name,
FLAGS.dataset_split_name,
FLAGS.dataset_dir,
FLAGS.im_batch,
is_training=True)
Here the size of image
and num_instance
is different among images, then these inputs are stored in an RandomShuffleQueue
as below
data_queue = tf.RandomShuffleQueue(capacity=32, min_after_dequeue=16,
dtypes=(
image.dtype, ih.dtype, iw.dtype,
gt_boxes.dtype, gt_masks.dtype,
num_instances.dtype, img_id.dtype))
enqueue_op = data_queue.enqueue((image, ih, iw, gt_boxes, gt_masks, num_instances, img_id))
data_queue_runner = tf.train.QueueRunner(data_queue, [enqueue_op] * 4)
tf.add_to_collection(tf.GraphKeys.QUEUE_RUNNERS, data_queue_runner)
the I use tower_grads
to gather the gradients in each GPU, then average them, below is the code for multi-gpu
tower_grads = []
num_gpus = 2
with tf.variable_scope(tf.get_variable_scope()):
for i in xrange(num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('tower_%d' % i) as scope:
(image, ih, iw, gt_boxes, gt_masks, num_instances, img_id) = data_queue.dequeue()
im_shape = tf.shape(image)
image = tf.reshape(image, (im_shape[0], im_shape[1], im_shape[2], 3))
total_loss = compute_loss() # use tensor from dequeue operation to compute loss
grads = compute_grads(total_loss)
tower_grads.append(grads)
grads = average_grads(tower_grads)
when num_gpus=1
, the code works well(I mean there is no error), but when I use two TITAN X GPUs, there are some strange errors below
- failed to enqueue async me mset operation: CUDA_ERROR_INVALID_HANDLE
- Internal: Blas GEMM launch failed
and the error is not the same when you run the code several times. I can't figure out why these errors occur for multi-gpu, some conflicts on data queue or GPUs?