I'm running this simple code which copies two large arrays to GPU memory for dot product computation.
import numpy as np
import theano
import theano.tensor as T
a = np.asarray(np.random.uniform(-1,1, (10000,40000)), dtype=np.float32)
b = np.asarray(np.random.uniform(-1,1, (40000,20000)), dtype=np.float32)
aa = theano.shared(a)
bb = theano.shared(b)
x = T.matrix()
y = T.matrix()
dd = T.dot(x,y)
ddd = theano.function([], dd, givens=[(x,aa), (y,bb)])
ddd()
However, it seems that after each code run (in Spyder), the GPU memory allocated for these shared variables is not freed.
I'm using Titan-X with 12GB of RAM, and I can only run this code twice, the third time it will produce the following error:
MemoryError: ('Error allocating 3200000000 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY).', "you might consider using 'theano.shared(..., borrow=True)'")
Here's my theanorc file:
[blas]
ldflags =
[nvcc]
flags=-LC:\Anaconda\libs
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin
[global]
device = gpu0
floatX = float32
print_active_device = True
optimizer_including = cudnn
allow_gc = True
[lib]
cnmem = 0.8
[dnn]
enabled = True
conv.algo_fwd = time_on_shape_change
conv.algo_bwd_filter = time_on_shape_change
conv.algo_bwd_data = time_on_shape_change
I'm monitoring GPU memory usage with nvidia-smi, and I can see that it consumes all 12GB of memory after 3 code runs.