1

I have two questions:

(1) How does Tensorflow allocate GPU memory when using only one GPU? I have an implementation of convolution 2d like this (globally using GPU):

def _conv(self, name, x, filter_size, in_filters, out_filters, strides):
    with tf.variable_scope(name):
        n = filter_size * filter_size * out_filters
        kernel = tf.get_variable(
            '', [filter_size, filter_size, in_filters, out_filters], tf.float32,
            initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / n)),
        )
        return tf.nn.conv2d(x, kernel, strides, padding='SAME')
        # another option
        # x = tf.nn.conv2d(x, kernel, strides, padding='SAME')
        # return x

The another option in the comments does the same operation but have added a new variable x. In this case, will TF allocate more GPU memory?

(2) when using multiple GPUs. I'd like to use list for gathering the results from multiple GPUs. The implementation is below:

def _conv(self, name, input, filter_size, in_filters, out_filters, strides, trainable=True):
    assert type(input) is list
    assert len(input) == FLAGS.gpu_num

    n = filter_size * filter_size * out_filters
    output = []
    for i in range(len(input)):
        with tf.device('/gpu:%d' % i):
            with tf.variable_scope(name, reuse=i > 0):
                kernel = tf.get_variable(
                    '', [filter_size, filter_size, in_filters, out_filters], tf.float32,
                    initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / n))
                )
                output.append(tf.nn.conv2d(input[i], kernel, strides, padding='SAME'))

    return output

Will TF allocate more memory because of the usage of list? Is output (the list) attached to some GPU device? I have these kinds of questions because when I am using two GPUs to train the CNNs with this implementation, the program uses much more GPU memory than when using one GPU. I think there is something I missed or misunderstood.

LI Xuhong
  • 2,339
  • 2
  • 17
  • 32
  • Have you had a look at the GPU options page on the [Tensorflow website here](https://www.tensorflow.org/tutorials/using_gpu#allowing_gpu_memory_growth). As I understand it, Tensorflow tends to grab as much GPU memory as it can and will then manage it itself. This can be slightly frustrating when trying to quickly monitor with nvidia SMI. However you can allow GPU growth which means that it only takes what it needs and can keep taking more. You can set TF to only be able to take a fraction of the memory too. See if that linked page answers any of your questions. – JCooke Jun 12 '17 at 10:16
  • Thanks for your comment but that link didn't answer my questions. In the GPU memory grabbed by TF, there is one part of "necessary" memory and another part for performance gains. When GPU memory is not sufficient, TF allocates only necessary memory (maybe there is something more complex behind and that's why we see some warnings of running out of memory but not a failure). The memory mentioned in my question means this necessary memory, not the memory grabbed by TF. – LI Xuhong Jun 12 '17 at 12:41
  • Ahhh ok I see. So you're more interested in how the memory is acquired and what for? I'm afraid i'm not qualified enough to give you a reasonable answer. Hopefully someone else can help you! – JCooke Jun 12 '17 at 13:05
  • Thank you. I'd like to use two GPUs and gather the intermediate results from them to do a real normalization (another question [here](https://stackoverflow.com/questions/43056966/ways-to-implement-multi-gpu-bn-layers-with-synchronizing-means-and-vars)). But after changing input and output to `list`, the GPU memory usage becomes strange. That's why I want to know how the memory is allocated in TF. – LI Xuhong Jun 12 '17 at 16:18

1 Answers1

0

Using this code to check each tensor and the attached device.

for n in tf.get_default_graph().as_graph_def().node:
    print n.name, n.device

So the answers for these two questions:

(1) No.

(2) If I'd like to gather the immediate data across GPUs, and the data are considered to compute the gradients, there would be problems. Because computing gradients consumes memory too. When accessing data across GPUs, additional memory will be allocated.

LI Xuhong
  • 2,339
  • 2
  • 17
  • 32