Low GPU usage when training a CNN

Question

I just installed tensorflow gpu and I started to train my convolutional neural network. The problem is that my gpu usage percentage is constantly at 0% and sometimes it increases until 20%. The CPU is somewhere at 20% and the disk above 60%. I tried to test if I installed it correctly and I done some matrix multiplications, in that case, everything was allright and the GPU usage was above 90%.

with tf.device("/gpu:0"):
    #here I set up the computational graph

when I run the graph I use this, so the compiler will decide if one operation has a gpu implementation or not

with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:

I have an NVIDIA GEFORCE GTX 950m graphic card and I don't get errors at runtime. What am I doing wrong?

later edit, my computation graph

with tf.device("/gpu:0"):
    X = tf.placeholder(tf.float32, shape=[None, height, width, channels], name="X")
    dropout_rate= 0.3


    training = tf.placeholder_with_default(False, shape=(), name="training")
    X_drop = tf.layers.dropout(X, dropout_rate, training = training)

    y = tf.placeholder(tf.int32, shape = [None], name="y")


    conv1 = tf.layers.conv2d(X_drop, filters=32, kernel_size=3,
                            strides=1, padding="SAME",
                            activation=tf.nn.relu, name="conv1")

    conv2 = tf.layers.conv2d(conv1, filters=64, kernel_size=3,
                            strides=2, padding="SAME",
                            activation=tf.nn.relu, name="conv2")

    pool3 = tf.nn.max_pool(conv2,
                            ksize=[1, 2, 2, 1],
                            strides=[1, 2, 2, 1],
                            padding="VALID")

    conv4 = tf.layers.conv2d(pool3, filters=128, kernel_size=4,
                            strides=3, padding="SAME",
                            activation=tf.nn.relu, name="conv4")

    pool5 = tf.nn.max_pool(conv4,
                            ksize=[1, 2, 2, 1],
                            strides=[1, 1, 1, 1],
                            padding="VALID")


    pool5_flat = tf.reshape(pool5, shape = [-1, 128*2*2])

    fullyconn1 = tf.layers.dense(pool5_flat, 128, activation=tf.nn.relu, name = "fc1")
    fullyconn2 = tf.layers.dense(fullyconn1, 64, activation=tf.nn.relu, name = "fc2")

    logits = tf.layers.dense(fullyconn2, 2, name="output")

    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y)

    loss = tf.reduce_mean(xentropy)
    optimizer = tf.train.AdamOptimizer()
    training_op = optimizer.minimize(loss)

    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

    init = tf.global_variables_initializer()
saver = tf.train.Saver()

hm_epochs = 100
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True

the batch size is 128

with tf.Session(config=config) as sess:
        tbWriter = tf.summary.FileWriter(logPath, sess.graph)
        dataset = tf.data.Dataset.from_tensor_slices((training_images, training_labels))
        dataset = dataset.map(rd.decodeAndResize)
        dataset = dataset.batch(batch_size)

        testset = tf.data.Dataset.from_tensor_slices((test_images, test_labels))
        testset = testset.map(rd.decodeAndResize)
        testset = testset.batch(len(test_images))

        iterator = dataset.make_initializable_iterator()
        test_iterator = testset.make_initializable_iterator()
        next_element = iterator.get_next()
        sess.run(tf.global_variables_initializer())
        for epoch in range(hm_epochs):
            epoch_loss = 0
            sess.run(iterator.initializer)
            while True:
                try:
                    epoch_x, epoch_y = sess.run(next_element)
                    # _, c = sess.run([optimizer, cost], feed_dict={x: epoch_x, y: epoch_y})
                    # epoch_loss += c
                    sess.run(training_op, feed_dict={X:epoch_x, y:epoch_y, training:True})
                except tf.errors.OutOfRangeError:
                    break


            sess.run(test_iterator.initializer)
            # acc_train = accuracy.eval(feed_dict={X:epoch_x, y:epoch_y})
            try:
                next_test = test_iterator.get_next()
                test_images, test_labels = sess.run(next_test)
                acc_test = accuracy.eval(feed_dict={X:test_images, y:test_labels})
                print("Epoch {0}: Train accuracy {1}".format(epoch, acc_test))
            except tf.errors.OutOfRangeError:
                break
            # print("Epoch {0}: Train accuracy {1}, Test accuracy: {2}".format(epoch, acc_train, acc_test))
        save_path = saver.save(sess, "./my_first_model")

I have 9k training pictures and 3k pictures for testing

There can be many reasons for this, and it is impossible to tell exactly what the reason is without more details and code. One possible explanation is that feeding and preparing your input data batches takes a lot of time (this is typically done on the CPU). Meanwhile, the GPU is idle waiting for something to work on. — mikkola, Mar 17 '18 at 16:19
hi mikkola, thank you for your reply. I edited the post and added the code. — Laci Szakács, Mar 17 '18 at 16:44

score 1 · Accepted Answer · answered Mar 17 '18 at 18:07

There are a few issues in your code that may result in low GPU usage.

1) Add a prefetch instruction at the end of your Dataset pipeline to enable the CPU to maintain a buffer of input data batches ready to move them to the GPU.

# this should be the last thing in your pipeline
dataset = dataset.prefetch(1)

2) You are using feed_dict to feed your model, along with Dataset iterators. This is not the intended way! feed_dict is the slowest method of inputting data to your model and not recommended. You should define your model in terms of the next_element outputs of the iterators.

Example:

next_x, next_y = iterator.get_next()
with tf.device('/GPU:0'):
    conv1 = tf.layers.conv2d(next_x, filters=32, kernel_size=3,
                        strides=1, padding="SAME",
                        activation=tf.nn.relu, name="conv1")
    # rest of model here...
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, 
                 labels=next_y)

Then you can call your training operation without using feed_dict, and the iterator will handle feeding data to your model behind the scenes. Here is another related Q&A. Yout new training loop would look something like this:

while True:
    try:
        sess.run(training_op, feed_dict={training:True})
    except tf.errors.OutOfRangeError:
        break

You should only input data via feed_dict that your iterator does not provide, and these should typically be very lightweight.

For further tips on performance, you can refer to this guide on TF website.

Thank you very much for the help, I will change those things and I will come back with an answer. — Laci Szakács, Mar 17 '18 at 18:19
I have a question. If I intagrate the iterator in my computation graph, how can I feed in the testing data? — Laci Szakács, Mar 18 '18 at 10:44
@LaciSzakács you can use a re-initializable or feedable iterator. Check out the in-depth guide here: https://www.tensorflow.org/programmers_guide/datasets#creating_an_iterator — mikkola, Mar 18 '18 at 13:51
I managed to use a re-initializable iterator. Now my usage percentage grows till 20% periodically... but still does not use 100% of my gpu :( — Laci Szakács, Mar 18 '18 at 19:28

score 0 · Answer 2 · answered Mar 17 '18 at 19:52

0

You could try the following code to see if tensorflow is recognizing your GPU:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

answered Mar 17 '18 at 19:52

F0123X

26
1
2

Hi, thank you for your answer. I tried that code and tensorflow recognized my gtx gpu card – Laci Szakács Mar 18 '18 at 10:43

Low GPU usage when training a CNN

2 Answers2