Tensorflow batching is very slow

Question

I tried to setup a very simple Mnist example with an Estimator.

First I used the estimator's deprecated fit() parameters x, y and batch_size. This executed very fast and utilized about 100% of my GPU while not effecting the CPU much (about 10% utilization). So it worked as expected.

Because the x, y and batch_size parameters are deprecated, I wanted to use the input_fn parameter for the fit() function. To build the input_fn, I used a tf.slice_input_producer and batched it with tf.train.batch. This is my code https://gist.github.com/andreas-eberle/11f650fca0dce4c9d3d6c0955145e80d. You should be able to just run it with tensorflow 1.0.

My problem is that the training now runs very slow and only utilizes about 30 % of my GPU (shown in nvidia-smi).

I also tried to increase the queue capacity of the slice_input_producer and to increase the number of threads used for batching. However, this only helped to get to about 45% of GPU utilization and resulted in a 100 % GPU utilization.

What am I doing wrong? Is there a better way for feeding the inputs and batching them? I do not want to create the batches manually (creating subarrays of the numpy input array) because I want to use this example for a more complex input queue where I'll be reading and preprocessing the images in the graph.

I don't think my hardware should be the problem:

List item
Windows 10
NVidia GTX 960M
i7-6700HQ
32 GB RAM

Can you use `SKCompat` as a wrapper around your `Estimator`? That should have the equivalent functionality as the `x` and `y` args. I think the difference is that `slice_input_producer` is embedding the whole dataset in the graph (i.e. it expects `Tensors`), whereas `x`/`y`/`SKCompat` are using a `feed_fn` under the hood. — Allen Lavoie, Mar 13 '17 at 19:22
I tried using SKCompat, but wasn't able to find the correct import for it. Do you know the correct import? Furthermore, what should I do if I cannot hold the complete input data in memory and need to read it on the fly? I guess then I really need the tensorflow input_producers... — andy, Mar 14 '17 at 14:20
`tf.contrib.learn.SKCompat` works in 1.0.0 and [will be back for 1.1](https://github.com/tensorflow/tensorflow/pull/8254), but looks like there was a transient issue with overzealous interface sealing in 1.0.1 (should be moving into core TF soon). May be easier to just use the deprecated arguments in the meantime. Sorry you ran into that. [Queues](https://www.tensorflow.org/programmers_guide/threading_and_queues) are currently a good option for avoiding having the full dataset in memory. — Allen Lavoie, Mar 14 '17 at 18:14

Tensorflow batching is very slow

0 Answers0