I changed the title from "CUDNN_STATUS_ALLOC_FAILED with minimal network and data using CUDA 10.0 and CuDNN 7.6.x" to better describe my problem.
I have Laptop with an NVIDIA Geforce RTX 2060 GPU, that should have a Turing architecture and 7.5 compute capabilities: https://en.wikipedia.org/wiki/Turing_(microarchitecture)
According to the CuDNN support matrix, that GPU should be supported up the the most current CuDNN 7.6.3: https://docs.nvidia.com/deeplearning/sdk/cudnn-support-matrix/index.html
Having said that, this simple example code fails on one of my machines with tensorflow-gpu
:
import numpy as np
from tensorflow.keras import layers, models, optimizers
model = models.Sequential()
model.add(layers.Conv1D(1, 3, input_shape=(8, 1)))
optimizer = optimizers.Adam(lr=1e-6)
model.compile(optimizer=optimizer, loss='mse')
x = np.zeros((1, *model.input.shape[1:]))
y = np.zeros((1, *model.output.shape[1:]))
model.fit(x, y)
The output, cleaned up quite a bit, is:
I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened
dynamic library cudart64_100.dll
I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened
dynamic library nvcuda.dll
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device
(/job:localhost/replica:0/task:0/device:GPU:0 with 4608 MB memory) -> physical GPU
(device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened
dynamic library cudnn64_7.dll
E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle:
CUDNN_STATUS_ALLOC_FAILED
W tensorflow/core/common_runtime/base_collective_executor.cc:216]
BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This
is probably because cuDNN failed to initialize, so try looking to see if a warning log
message was printed above.
[[{{node sequential/conv3d/Conv3D}}]]
Traceback (most recent call last):
File "NotWorking.py", line 12, in <module>
model.fit(x, y)
...
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution
algorithm. This is probably because cuDNN failed to initialize, so try looking to see
if a warning log message was printed above.
[[node sequential/conv3d/Conv3D (defined at C:\Users\bers\AppData\Local\Programs\
Python\Python37\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]]
[Op:__inference_distributed_function_505]
Function call stack:
distributed_function
The problem is reproducible in tensorflow-gpu==1.14.0
, 2.0.0rc0
, 2.0.0rc1
(installed via pip
), with CUDA 10.0.130 and several versions of CuDNN [7.6.3.30, 7.6.2.24, 7.6.0.64] for CUDA 10.0 on Python 3.7.4.
Edit 1: I have already uninstalled and reinstalled TensorFlow, Python, and everything that looks NVIDIA-related from my system, including CUDA, CuDNN, and graphics drivers until Windows could only assign a basic VGA driver to the NVIDIA GPU. The problem does not appear with only tensorflow
installed, and not on a different computer with a Quadro M5000 with CuDNN 7.6.2 on CUDA 10.0.
Edit 2: I have run a number of experiments using the code below. These are my results:
Failing:
- Python 3.7.4, tensorflow-gpu==2.0.0rc0/2.0.0rc1 (compiled for CUDA 10.0, CuDNN 7.6.0), CUDA 10.0.130, CuDNN 7.6.3.30/7.6.2.24/7.6.0.64 for CUDA 10.0
- Python 3.7.4, tensorflow-gpu==1.14.0/1.13.1 (compiled for CUDA 10.0, CuDNN 7.4.1), CUDA 10.0.130, CuDNN 7.6.0.64 for CUDA 10.0
- Python 3.6.8, tensorflow-gpu==1.14.0/1.13.1 (compiled for CUDA 10.0, CuDNN 7.4.1), CUDA 10.0.130, CuDNN 7.6.3.30/7.6.0.64 for CUDA 10.0
Working:
- Python 3.7.4, tensorflow-gpu==1.14.0/1.13.1 (compiled for CUDA 10.0, CuDNN 7.4.1), CUDA 10.0.130, CuDNN 7.5.1.10/7.4.2.24/7.4.1.5 for CUDA 10.0
- Python 3.6.8, tensorflow-gpu==1.14.0/1.13.1 (compiled for CUDA 10.0, CuDNN 7.4.1), CUDA 10.0.130, CuDNN 7.5.1.10 for CUDA 10.0
- Python 3.6.8, tensorflow-gpu==1.12.3/1.12.0 (compiled for CUDA 9.0), CUDA 9.0.176, CuDNN 7.6.3.30/7.6.0.64/7.5.1.10/7.1.4/7.0.5 for CUDA 9.0
So the problem is independent of the Python version (compare 2 vs. 3), independent of the exact version of tensorflow-gpu
(compare 1 vs. 2), including independent of the exact version of CuDNN that tensorflow-gpu
is compiled for (compare 1 vs. 2).
So far, the problem appears only for any version of CuDNN>=7.6.0 on CUDA 10.0 - meaning, the latest CuDNN 7.5 works fine on CUDA 10.0 (see 4 and 5), while the earliest CuDNN 7.6 fails on CUDA 10.0 (see 2 and 3), and both are working fine on CUDA 9.0 (see 6).
Interestingly, the following variations of the code are also working fine on all system variations mentioned above:
model.add(layers.Conv1D(1, 3, input_shape=(3, 1))) # changed input shape
and
model.add(layers.Conv1D(1, 3, input_shape=(8, 1)))
model.add(layers.Dense(1)) # added Dense layer!
So in summary, some specific TensorFlow code (not even every TensorFlow code) fails only with CuDNN 7.6 on CUDA 10.0. Unfortunately, TensorFlow 2 has been compiled against CuDNN 7.6.0, so I am not able to run able TF2 code.
What may be going on here?