I have two questions:
(1) How does Tensorflow allocate GPU memory when using only one GPU? I have an implementation of convolution 2d like this (globally using GPU):
def _conv(self, name, x, filter_size, in_filters, out_filters, strides):
with tf.variable_scope(name):
n = filter_size * filter_size * out_filters
kernel = tf.get_variable(
'', [filter_size, filter_size, in_filters, out_filters], tf.float32,
initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / n)),
)
return tf.nn.conv2d(x, kernel, strides, padding='SAME')
# another option
# x = tf.nn.conv2d(x, kernel, strides, padding='SAME')
# return x
The another option in the comments does the same operation but have added a new variable x
. In this case, will TF allocate more GPU memory?
(2) when using multiple GPUs. I'd like to use list
for gathering the results from multiple GPUs. The implementation is below:
def _conv(self, name, input, filter_size, in_filters, out_filters, strides, trainable=True):
assert type(input) is list
assert len(input) == FLAGS.gpu_num
n = filter_size * filter_size * out_filters
output = []
for i in range(len(input)):
with tf.device('/gpu:%d' % i):
with tf.variable_scope(name, reuse=i > 0):
kernel = tf.get_variable(
'', [filter_size, filter_size, in_filters, out_filters], tf.float32,
initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / n))
)
output.append(tf.nn.conv2d(input[i], kernel, strides, padding='SAME'))
return output
Will TF allocate more memory because of the usage of list
? Is output
(the list
) attached to some GPU device? I have these kinds of questions because when I am using two GPUs to train the CNNs with this implementation, the program uses much more GPU memory than when using one GPU. I think there is something I missed or misunderstood.