1

I am implementing some deep learning algorithms using theano. After I stop some programs running theano, occasionally the following error appears if I want to import theano again.

    >>> import theano
ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device gpu failed:
initCnmem: cnmemInit call failed! Reason=CNMEM_STATUS_OUT_OF_MEMORY. numdev=1

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jjhu/.local/lib/python2.7/site-packages/theano/__init__.py", line 118, in <module>
    theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1()
  File "/home/jjhu/.local/lib/python2.7/site-packages/theano/sandbox/cuda/tests/test_driver.py", line 40, in test_nvidia_driver1
    if not numpy.allclose(f(), a.sum()):
  File "/home/jjhu/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 875, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/home/jjhu/.local/lib/python2.7/site-packages/theano/gof/link.py", line 317, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/home/jjhu/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 862, in __call__
    self.fn() if output_subset is None else\
RuntimeError: Cuda error: kernel_reduce_ccontig_node_4894639462a290346189bb38dab7bb7e_0: out of memory. (grid: 1 x 1; block: 256 x 1 x 1)

Apply node that caused the error: GpuCAReduce{add}{1}(<CudaNdarrayType(float32, vector)>)
Toposort index: 0
Inputs types: [CudaNdarrayType(float32, vector)]
Inputs shapes: [(10000,)]
Inputs strides: [(1,)]
Inputs values: ['not shown']
Outputs clients: [[HostFromGpu(GpuCAReduce{add}{1}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

I search for several solutions. Someone suggests to remove the compilation folder by rm -rf ./theano . I also check that the owner of ./theano is not root user. I also try setting my ./theanorc as following. But both do not work for me.

[global]
floatX = float32
device = cpu
optimizer=fast_run

[lib]
cnmem = 0.1

[cuda]
root = /usr/local/cuda

The only working solution is to reboot or log out the machine. It is very awkward. I don't know what causes this problem. Can anyone suggest some solutions?

Jun
  • 35
  • 7
  • Well, the error sugests that the GPU is out of memory. Are you sure it is not the case? Which GPU are you using? What is the result of `nvidia-smi` (if applicable)? Anyway, this error shouldn't happen with `device=cpu` in `~/.theanorc` -- does it happen when you start python as: `THEANO_FLAGS=device=cpu python`? – sygi Nov 10 '16 at 12:11
  • Thanks for your reply. I have GTX 1080 (8GB) installed in my machine, and device=gpu in my ~/.theanorc. I don't start python with the THEANO_FLAGS=device=cpu. This annoying problem happens even when I just try to import the theano library, I don't even use GPU to do any computation yet. As I described above, this problem happens when I stop some running programs using theano. I suspect some memory caches are not released. Only when I restart my machine, the GPU memory are reset to normal. Do you know how to solve it? – Jun Nov 12 '16 at 00:15
  • Have you taken look at `nvidia-smi` results? It will tell you how much memory is used at a given moment. – sygi Nov 12 '16 at 09:39
  • Sorry for the late response. Yes, I use nvidia-smi to look at the result, but it shows that my GPU usage is just around 30%. – Jun Nov 16 '16 at 17:05

0 Answers0