Consider the following two line Python/TensorFlow interactive session:
import tensorflow as tf
s=tf.Session()
If these commands are executed on an Ubuntu Linux 14.04 machine, using Anaconda Python 2.7.13 and TensorFlow r1.3 (compiled from sources), with 32G physical memory and 2 GPUs (a GTX Titan X and a GTX 970) while CUDA_VISIBLE_DEVICES
is not set (i.e. both GPUs are visible) the resulting python process has 59.7G of memory allocated! Note that it only actually uses 754M.
If CUDA_VISIBLE_DEVICES=0
(i.e. only the Titan X is visible) then 55.2G is allocated and 137M is in use.
If CUDA_VISIBLE_DEVICES=1
(i.e. only the 970 is visible) then 47.0G is allocated and 325M is in use.
If CUDA_VISIBLE_DEVICES=
(i.e. neither GPU is visible) then only 2.5G is allocated and only 131M is in use.
This is a problem in environments where the amount of allocated memory is constrained, e.g. inside a grid engine setup.
Is there any way to limit the amount of main memory that TensorFlow allocates when it is using CUDA?
Update 1
The amount of memory allocated is determined, in these trials, by looking at the VIRT
column in htop
.
TensorFlow r1.3 is compiled with mostly default configure
answers. The only variations are the paths to CUDA and cuDNN. As a result, jemalloc
is being used.
Update 2
I've tried recompiling with jemalloc
disabled and see the same behaviour.