I am using theano and lasagne for a DNN speech enhancement project. I use a feed-forward network very similar to the mnist example in the lasagne documentation (/github.com/Lasagne/Lasagne/blob/master/examples/mnist.py). This network uses several dropout layers. I train my network on an Nvidia Titan X GPU. However, when I do not use dropout my GPU utilization is approximately 60% and one epoch takes around 60s but when I use dropout my GPU utilization drops to 8% and each epoch takes approximately 600s. This is regardless of the dropout rate is set to 20% or 0.1%.
Initially I thought it was due to the random number generator (RNG) used to generate the dropout mask, that did not run on the GPU. However, in the code (https://github.com/Lasagne/Lasagne/blob/master/lasagne/layers/noise.py) it seems like rng_mrg is used, which should be running on the GPU based on this link: http://deeplearning.net/software/theano/tutorial/examples.html#other-implementations
Running the theano profiler shows that "theano.sandbox.rng_mrg.mrg_uniform" takes up 86.7% of the execution time, which I do not understand.
If anyone has an idea of what kills my GPU utilization I would appreciate it.