4

I have a very powerful Windows PC (running Windows 10) which has 112GB memory, 16 cores and 3 X Geforce RTX2070 (Doesn't support SLI etc.). It is running CuDNN 7.5 + Tensorflor 1.13 + Python 3.7

My issue is that I am getting the error below - whenever I try to run Keras model for training or to make prediction on a matrix. In the beginning I thought it happend only if I ran more that one program simultaneously, but it was not the case, now I am also getting the error when I'm only running a single instance of Keras (often - but not always)

2019-06-15 19:33:17.878911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 6317 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2070, pci bus id: 0000:44:00.0, compute capability: 7.5) 2019-06-15 19:33:23.423911: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_100.dll locally 2019-06-15 19:33:23.744678: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2019-06-15 19:33:23.748069: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2019-06-15 19:33:23.751235: E tensorflow/stream_executor/cuda/cuda_blas.cc:510] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2019-06-15 19:33:25.267137: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED 2019-06-15 19:33:25.270582: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED Exception: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node conv2d_1/convolution}}]] [[{{node dense_3/Sigmoid}}]]

Siddharth Das
  • 1,057
  • 1
  • 15
  • 33
PabloDK
  • 2,181
  • 2
  • 19
  • 42

2 Answers2

5

On Tensorflow 2.0 and above, you can solve this issue by this way :

os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'

or

physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
Bensuperpc
  • 1,275
  • 1
  • 14
  • 21
  • 2
    I assume you didn't mean to indent the `if`. Is that correct? Also why are you only applying to one GPU? Thanks. – Robert Lugg Jun 24 '20 at 19:27
  • 1
    @RobertLugg Thank, I edit the post, i have only one GPU, but you can create for loop to configure multiple GPU ^^ – Bensuperpc Sep 10 '20 at 07:57
1

Add the following to your code

from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True  # dynamically grow the memory used on the GPU
config.log_device_placement = True  # to log device placement (on which device the operation ran)
sess = tf.Session(config=config)
set_session(sess)  # set this TensorFlow session as the default session for Keras
Siddharth Das
  • 1,057
  • 1
  • 15
  • 33
  • 2
    But the key is to write it immediately below "import tensorflow as tf" which I wasn't doing. I had written it after all the imports. – PabloDK Jun 15 '19 at 21:32