Tensorflow GPU memory error try-except not catching the error

Question

I am trying to run a hyperparameter optimization (using spearmint) on a big network with lots of trainable variables. I am worried that when I try a network with the number of hidden units too large, the Tensorflow will throw a GPU memory error.

I was wondering if there is a way of catching the GPU memory error thrown by Tensorflow and skip the batch of hyperparameters that causes the memory error.

For example, I would like something like

import tensorflow as tf 

dim = [100000,100000]
X   = tf.Variable( tf.truncated_normal( dim, stddev=0.1 ) )

with tf.Session() as sess:
    try:
        tf.global_variables_initializer().run()
    except Exception as e :
        print e

When I try above to test the memory error exception, the code breaks and just prints the GPU memory error and does not progress to the except block.

maybe your version is too old? I just [tried](https://github.com/yaroslavvb/stuff/blob/master/gpu_oom.py) in latest version, and it's caught on python side successfully — Yaroslav Bulatov, Jan 30 '17 at 18:12

H4k333m · Accepted Answer · 2021-09-09T16:34:43.640

0

Try this :

import tensorflow as tf

try:
    with tf.device("gpu:0"):
        a = tf.Variable(tf.ones((10000, 10000)))
        sess = tf.Session()
        sess.run(tf.initialize_all_variables())
except:
    print("Caught error")
    import pdb; pdb.set_trace()

source : https://github.com/yaroslavvb/stuff/blob/master/gpu_oom.py

edited Sep 09 '21 at 16:34

answered Jul 24 '19 at 13:08

H4k333m

54
4

Tensorflow GPU memory error try-except not catching the error

1 Answers1