Lasagne dropoutlayer does not utilize GPU efficiently

Question

I am using theano and lasagne for a DNN speech enhancement project. I use a feed-forward network very similar to the mnist example in the lasagne documentation (/github.com/Lasagne/Lasagne/blob/master/examples/mnist.py). This network uses several dropout layers. I train my network on an Nvidia Titan X GPU. However, when I do not use dropout my GPU utilization is approximately 60% and one epoch takes around 60s but when I use dropout my GPU utilization drops to 8% and each epoch takes approximately 600s. This is regardless of the dropout rate is set to 20% or 0.1%.

Initially I thought it was due to the random number generator (RNG) used to generate the dropout mask, that did not run on the GPU. However, in the code (https://github.com/Lasagne/Lasagne/blob/master/lasagne/layers/noise.py) it seems like rng_mrg is used, which should be running on the GPU based on this link: http://deeplearning.net/software/theano/tutorial/examples.html#other-implementations

Running the theano profiler shows that "theano.sandbox.rng_mrg.mrg_uniform" takes up 86.7% of the execution time, which I do not understand.

If anyone has an idea of what kills my GPU utilization I would appreciate it.

score 1 · Accepted Answer · answered Nov 08 '15 at 10:56

If you look at the code for mrg_uniform, you can see that it is a pure python CPU implementation of the random generator. You can also see that there is a GPU version of that same generator, but the code you are running apparently does not use it.

So the answer isn't that your GPU utilisation is going down so much as your CPU utilisation is greatly increasing because you are using a pure Python random generator. The solution would obviously be to work out how to switch to a GPU accelerated random generator.

Thank you very very much. This solved my problem. I have written my hack/solution below — Morten Kolbæk, Nov 08 '15 at 13:49

Morten Kolbæk · Answer 2 · 2015-11-08T13:58:26.467

As pointed out by talonmies the problem was that lasagne was using the CPU version of the RNG (mrg_uniform) and not the GPU version (GPU_mrg_uniform). I have not yet found an elegant solution but the following two hacks solves the problem.

Either change line 93 cuda_enabled = False to cuda_enabled = True in

https://github.com/Theano/Theano/blob/master/theano/sandbox/cuda/__init__.py

or

Change line 57
self._srng = RandomStreams(get_rng().randint(1, 2147462579))
to
self._srng = "RandomStreams(get_rng().randint(1, 2147462579),use_cuda = True) in https://github.com/Lasagne/Lasagne/blob/master/lasagne/layers/noise.py

I also believe you should be able to do the same by simply typing theano.sandbox.cuda.use(enable_cuda=True) directly in the main script. However, this did not for some reasons not work for me.

Lasagne dropoutlayer does not utilize GPU efficiently

2 Answers2

Linked