I am compiling a git version of the MXNet framework, which use CuDNN inside its code. Whenever MXNet is compiled in debug, my example test is running fine and my neural network is training. However, when I switch to release mode, the execution fails a test and I get the following error: Check failed: e == CUDNN_STATUS_SUCCESS (8 vs. 0) cuDNN: CUDNN_STATUS_EXECUTION_FAILED
.
Note: I don't see any release/debug code which could explain a different behaviour. And I didn't had any problem at all with both release and debug version until I activated CuDNN, thus I trust it is the culprit.
The symptoms:
- The code doesn't necessarily crash at the same location. But it is always during a CUDNN_CALL
(which is a macro that calls a CuDNN function and check the status).
- No memory is allocated on my GPU, which has anyway enough memory for such network, thus it shouldn't be a problem.
- It happens only in release - in debug, it is running just fine.
Here is an example of where I get the error:
CUDNN_CALL(cudnnAddTensor(s->dnn_handle_,
&alpha,
bias_desc_,
bias.dptr_ + bias_offset_ * g,
&beta_add,
out_desc_,
out_ptr + out_offset_ * g));
So, what could be the causes of such a problem?