Should I be able to use precompiled TensorFlow 2 on a Laptop Geforce RTX 2060 GPU?

Question

I changed the title from "CUDNN_STATUS_ALLOC_FAILED with minimal network and data using CUDA 10.0 and CuDNN 7.6.x" to better describe my problem.

I have Laptop with an NVIDIA Geforce RTX 2060 GPU, that should have a Turing architecture and 7.5 compute capabilities: https://en.wikipedia.org/wiki/Turing_(microarchitecture)

According to the CuDNN support matrix, that GPU should be supported up the the most current CuDNN 7.6.3: https://docs.nvidia.com/deeplearning/sdk/cudnn-support-matrix/index.html

Having said that, this simple example code fails on one of my machines with tensorflow-gpu:

import numpy as np
from tensorflow.keras import layers, models, optimizers

model = models.Sequential()
model.add(layers.Conv1D(1, 3, input_shape=(8, 1)))


optimizer = optimizers.Adam(lr=1e-6)
model.compile(optimizer=optimizer, loss='mse')
x = np.zeros((1, *model.input.shape[1:]))
y = np.zeros((1, *model.output.shape[1:]))
model.fit(x, y)

The output, cleaned up quite a bit, is:

I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened 
  dynamic library cudart64_100.dll
I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened 
  dynamic library nvcuda.dll
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device 
  (/job:localhost/replica:0/task:0/device:GPU:0 with 4608 MB memory) -> physical GPU 
  (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened 
  dynamic library cudnn64_7.dll
E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: 
  CUDNN_STATUS_ALLOC_FAILED
W tensorflow/core/common_runtime/base_collective_executor.cc:216] 
  BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This 
  is probably because cuDNN failed to initialize, so try looking to see if a warning log 
  message was printed above.
     [[{{node sequential/conv3d/Conv3D}}]]
Traceback (most recent call last):
  File "NotWorking.py", line 12, in <module>
    model.fit(x, y)
...
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution 
  algorithm. This is probably because cuDNN failed to initialize, so try looking to see 
  if a warning log message was printed above.
     [[node sequential/conv3d/Conv3D (defined at C:\Users\bers\AppData\Local\Programs\
  Python\Python37\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] 
  [Op:__inference_distributed_function_505]

Function call stack:
distributed_function

The problem is reproducible in tensorflow-gpu==1.14.0, 2.0.0rc0, 2.0.0rc1 (installed via pip), with CUDA 10.0.130 and several versions of CuDNN [7.6.3.30, 7.6.2.24, 7.6.0.64] for CUDA 10.0 on Python 3.7.4.

Edit 1: I have already uninstalled and reinstalled TensorFlow, Python, and everything that looks NVIDIA-related from my system, including CUDA, CuDNN, and graphics drivers until Windows could only assign a basic VGA driver to the NVIDIA GPU. The problem does not appear with only tensorflow installed, and not on a different computer with a Quadro M5000 with CuDNN 7.6.2 on CUDA 10.0.

Edit 2: I have run a number of experiments using the code below. These are my results:

Failing:

Python 3.7.4, tensorflow-gpu==2.0.0rc0/2.0.0rc1 (compiled for CUDA 10.0, CuDNN 7.6.0), CUDA 10.0.130, CuDNN 7.6.3.30/7.6.2.24/7.6.0.64 for CUDA 10.0
Python 3.7.4, tensorflow-gpu==1.14.0/1.13.1 (compiled for CUDA 10.0, CuDNN 7.4.1), CUDA 10.0.130, CuDNN 7.6.0.64 for CUDA 10.0
Python 3.6.8, tensorflow-gpu==1.14.0/1.13.1 (compiled for CUDA 10.0, CuDNN 7.4.1), CUDA 10.0.130, CuDNN 7.6.3.30/7.6.0.64 for CUDA 10.0

Working:

Python 3.7.4, tensorflow-gpu==1.14.0/1.13.1 (compiled for CUDA 10.0, CuDNN 7.4.1), CUDA 10.0.130, CuDNN 7.5.1.10/7.4.2.24/7.4.1.5 for CUDA 10.0
Python 3.6.8, tensorflow-gpu==1.14.0/1.13.1 (compiled for CUDA 10.0, CuDNN 7.4.1), CUDA 10.0.130, CuDNN 7.5.1.10 for CUDA 10.0
Python 3.6.8, tensorflow-gpu==1.12.3/1.12.0 (compiled for CUDA 9.0), CUDA 9.0.176, CuDNN 7.6.3.30/7.6.0.64/7.5.1.10/7.1.4/7.0.5 for CUDA 9.0

So the problem is independent of the Python version (compare 2 vs. 3), independent of the exact version of tensorflow-gpu (compare 1 vs. 2), including independent of the exact version of CuDNN that tensorflow-gpu is compiled for (compare 1 vs. 2).

So far, the problem appears only for any version of CuDNN>=7.6.0 on CUDA 10.0 - meaning, the latest CuDNN 7.5 works fine on CUDA 10.0 (see 4 and 5), while the earliest CuDNN 7.6 fails on CUDA 10.0 (see 2 and 3), and both are working fine on CUDA 9.0 (see 6).

Interestingly, the following variations of the code are also working fine on all system variations mentioned above:

model.add(layers.Conv1D(1, 3, input_shape=(3, 1)))  # changed input shape

and

model.add(layers.Conv1D(1, 3, input_shape=(8, 1)))
model.add(layers.Dense(1))  # added Dense layer!

So in summary, some specific TensorFlow code (not even every TensorFlow code) fails only with CuDNN 7.6 on CUDA 10.0. Unfortunately, TensorFlow 2 has been compiled against CuDNN 7.6.0, so I am not able to run able TF2 code.

What may be going on here?

This SU question seems very much related: https://superuser.com/questions/1397250/cudnn-error-failed-to-get-convolution-algorithm — bers, Sep 11 '19 at 05:28
Again cleaning all drivers, and again installing the most current drivers (436.30) seems to have fixed this. — bers, Sep 16 '19 at 10:56
Is your issue resolved ? Else, `CUDNN_STATUS_ALLOC_FAILED` is generally comes when your GPU is running out of memory. — , Jun 10 '20 at 11:14
@TensorflowWarriors solved, see comment above. (Also, what do you think which operation in my example code allocates too much memory? :D) — bers, Jun 11 '20 at 21:32

Should I be able to use precompiled TensorFlow 2 on a Laptop Geforce RTX 2060 GPU?

0 Answers0