0

I am training my first multi-gpu model using tensorflow. As the tutorial states the variables are pinned onto the CPU and ops on every GPU using name_scope.

As i am running a small test and logging the device placement, i can see the ops being placed onto respective GPU with TOWER_1/TOWER_0 prefix but the variables are not being placed on the CPU .

Am i missing something or have i understood the device placement log incorrectly.

Attaching the Test Code and here is the device placement log

Thanks

TEST CODE

with tf.device('cpu:0'):  
    imgPath=tf.placeholder(tf.string)
    imageString=tf.read_file(imgPath)
    imageJpeg=tf.image.decode_jpeg(imageString, channels=3)
    inputImage=tf.image.resize_images(imageJpeg, [299,299])
    inputs  = tf.expand_dims(inputImage, 0)
    for i in range(2):
        with tf.device('/gpu:%d' % i):
            with tf.name_scope('%s_%d' % ('TOWER', i)) as scope:
                with slim.arg_scope([tf.contrib.framework.python.ops.variables.variable], device='/cpu:0'):
                    with slim.arg_scope(inception_v3.inception_v3_arg_scope()):
                        logits,endpoints = inception_v3.inception_v3(inputs, num_classes=1001, is_training=False)
                tf.get_variable_scope().reuse_variables()

with tf.Session(config=tf.ConfigProto(allow_soft_placement=True,log_device_placement=True)) as sess:
    tf.initialize_all_variables().run()
exit(0)

EDIT Basically the line 'with slim.arg_scope([tf.contrib.framework.python.ops.variables.variable], device='/cpu:0'):' should force all the variables on the cpu, but they are created on 'gpu:0'

  • Well, the variables until `expand_dims` are placed in `cpu`, as you requested with `with tf.device('cpu:0'):`. All the variables connected to `inception` model are placed in `gpu`. – sygi Nov 12 '16 at 13:20
  • thanks, i understand that , what is the role of 'with slim.arg_scope([tf.contrib.framework.python.ops.variables.variable], device='/cpu:0'):' though – Ashish Kumar Nov 12 '16 at 14:36
  • Isn't `allow_soft_placement` interfering? If you set it to `False`, it should place it where you told it to (or fail). – drpng Nov 12 '16 at 15:54
  • according to this [inception_train](https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py) example , allow_soft_placement=True as some ops might not have GPU implementations – Ashish Kumar Nov 12 '16 at 17:49
  • Hi @AshishKumar, have you known the reason? Could you pls share what you found with me? Thx! – ROBOT AI Jan 07 '17 at 18:25

1 Answers1

0

Try with:

with slim.arg_scope([slim.model_variable, slim.variable], device='/cpu:0'):

This was taken from: model_deploy

Juan Terven
  • 224
  • 2
  • 4