I'm getting a confusing cuDNN error when doing predictions on a loaded tensorflow model

Question

I created a ResNet model and saved it, but when trying to run predictions after loading in a different notebook, I get a bunch of errors.

Lets say I have notebook A and B. In notebook A I have created a model called resnet_model. I can run predictions and it's all fine. I saved the model like this

resnet_model.save(os.path.join(DATAPATH,"res1_network.h5"))

I then load the model in notebook A

loaded_model = load_model(os.path.join(DATAPATH,"res1_network.h5"))

I run predictions on it and it's exactly the same as before. Now if I go to notebook B and load the model and attempt to predict like this

res1_model = load_model(os.path.join(DATAPATH,"res1_network.h5"))
res1_model.predict(pred_list, verbose=1)

I get a series of errors

UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node model_2/conv2d/Conv2D (defined at C:\Users\Dave\Miniconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_distributed_function_23018]

Function call stack:
distributed_function

How can cuDNN not be working properly in this notebook, but it's fine in the other one. How could I even build the model if it doesn't work?

Did you reset or shutdown kernel A before running B? Maybe it is related to GPU memory. — spadarian, Feb 19 '20 at 05:08
By default, TensorFlow maps nearly all of the GPU memory of all GPUs (https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth) so maybe it is related to that. — spadarian, Feb 24 '20 at 03:12
I think this was it. I had been training something in a different notebook. The training was done, but i think the GPU was holding on to something. I has to shut down the kernel — Khachatur Mirijanyan, Feb 24 '20 at 18:26

spadarian · Answer 1 · 2020-02-25T04:40:11.500

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (source) so maybe it is related to that.

You can try resetting the kernel of your notebook A to free that memory before running notebook B.

Alternatively, you can set the allow_growth option to only allocate more GPU memory if needed:

tf_config=tf.ConfigProto()
tf_config.gpu_options.allow_growth=True
sess = tf.Session(config=tf_config)
K.set_session(sess)

Of course, that works depending on how many sessions you are running at the same time, total GPU memory, size of the models, etc.

To check how much memory is currently being used, you can use nvidia-smi. I'm not a Windows user but maybe this answer can help you (How do I run nvidia-smi on Windows?).

score 0 · Answer 2 · answered Feb 25 '20 at 09:20

THIS CODE WORK FOR ME

tensorflow>=2.0

import tensorflow as tf
config = tf.compat.v1.ConfigProto(gpu_options = 
                     tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.8)
 # device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(session)

If this Not work !!!! then you have reinstall cuda software from start How we install tensorflow gpu

I'm getting a confusing cuDNN error when doing predictions on a loaded tensorflow model

2 Answers2