6

I am running keras neural network training and prediction on GTX 1070 on Windows 10. Most times it is working, but from time to time it complains

E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:359] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:366] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:326] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\kernels\conv_ops.cc:659] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

It cannot be explained neither by literally error meaning nor by OOM error.

How to fix?

talonmies
  • 70,661
  • 34
  • 192
  • 269
Dims
  • 47,675
  • 117
  • 331
  • 600

6 Answers6

6

Try limiting your gpu usage with set gpu option per_process_gpu_memory_fraction.

Fiddle around with it to see what works and what doesn't.

I recommend using .7 as a starting baseline.

liam
  • 1,918
  • 3
  • 22
  • 28
  • This solved my problem, but the procedure to limit GPU usage seems to have changed. The following worked for me: https://github.com/tensorflow/tensorflow/issues/46038#issuecomment-753591451 – Kevin D. Jan 26 '21 at 21:13
3

I met the problem sometimes on Windows10 and Keras. Reboot solve the problem for a short time, but happen again.

I refer to https://github.com/fchollet/keras/issues/1538

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.3
set_session(tf.Session(config=config))

the settings solve the halt problem.

peroon
  • 84
  • 4
3

Got the solution for this problem. I had the same problem on Windows 10 with Nvidia GEforce 920M. Search for the correct version of cudnn library. If the version is not compatable with the CUDA version it won't throw the error while tensorflow installation but will interfere during memory allocation in the GPU. DO check your CUDA and CUDNN versions. Also follow the instructions about creation of sessions mentioned above.

3

Finally the issue is now resolved for me, I spent many hours struggling with this.

I recommend follow all the steps of installation properly as mentioned in links

TensorFlow- https://www.tensorflow.org/install/install_windows

and for CuDNN -

https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#install-windows

for me this wasn't enough, I tried updating my GeForce Game Ready Driver from GeForce Experience window, and after restart it started working for me.

GeForce Experience

the driver can also be downloaded from link https://www.geforce.com/drivers

Arkil Shaikh
  • 342
  • 3
  • 9
3

Similar to what other people are saying, enabling memory growth for your GPUs can resolve this issue.

The following works for me by adding to the beginning of the training script:

# Using Tensorflow-2.4.x
import tensorflow as tf
try:
    tf_gpus = tf.config.list_physical_devices('GPU')
    for gpu in tf_gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
except:
    pass 
driedler
  • 3,750
  • 33
  • 26
0

the tf doku help me a lot Allowing GPU memory growth

The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations: it starts out allocating very little memory, and as Sessions get run and more GPU memory is needed, we extend the GPU memory region needed by the TensorFlow process. Note that we do not release memory, since that can lead to even worse memory fragmentation. To turn this option on, set the option in the ConfigProto by:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

or

with tf.Session(graph=graph_node, config=config) as sess:
     ...

The second method is the per_process_gpu_memory_fraction option, which determines the fraction of the overall amount of memory that each visible GPU should be allocated. For example, you can tell TensorFlow to only allocate 40% of the total memory of each GPU by:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)
Ari Gold
  • 1,528
  • 11
  • 18