What's the effect of CNMeM enabled but 'cuDNN not available' in Theano?

Question

I have the following code based on Theano example:

from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

Now when I test the code with two modes:

GPU mode, I get this:

$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python gpu.py
Using gpu device 0: Tesla C2075 (CNMeM is enabled with initial size: 95.0% of memory, cuDNN not available)
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.475526 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu

CPU mode, I get this:

$ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python gpu.py
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 5.221368 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu

Notice two things, GPU is indeed faster than CPU (0.47sec vs 5 sec). But at the same time at GPU I get the cuDNN not available message.

My question is this. What's the effect of the absence of cuDNN? Is it harmful?

It's been a little while since I last used theano, but I think you may be using the general CUDA libraries, so getting the benefits of GPU acceleration, but not the more specialized cudNN libraries. So if you are able to get cudNN installed and working you may be able to get some additional speedups. — Marius, May 18 '16 at 01:12

Vladimir Shebuniayeu · Answer 1 · 2017-03-15T13:15:08.510

If you didn't use cuDNN , you code not use all power of GPU. The benefit of GPU before CPU ,is that GPU have a lot of real cores(from 700 till 4000), ordinary CPU from 1 to 8.

But GPU cores can make only primitive calculation. if you are not use cuDNN ,other standard libraries makes calculation, or probably (i don't know exactly only use GPU memory and use simple CPU for calculation).

CuDNN is a GPU-accelerated library of primitives. It's means if you start to make Deep Neural Network application it will be not so fast ,as it can be.

Please read CuDNN

Note: because i wrote GPU cores can make only primitive calculation, if you choose to use GPU, but use function which is not supported of GPU, theano will switch application temporary for such function for CPU.(it's take time to make it)

What's the effect of CNMeM enabled but 'cuDNN not available' in Theano?

1 Answers1