1

I would like to change my old queue based pipeline to the new dataset API on tensorflow for reasons of performance. However, once my code changed, it runs in 8 hours instead of 2.

The use of my GPU was about 30/40% and it's now between 0 and 6%.

I found the line which making it so slow, and it's when I apply gaussian blur on my dataset :

def gaussian_blur(imgs,lbls):
   imgs = tf.nn.conv2d(imgs,k_conv,
                                   strides=[1, 1, 1, 1], 
                                   padding='SAME',
                                   data_format='NHWC'
                                   )
   return imgs, lbls

ds = ds.map(gaussian_blur)

With my old queue-based pipeline, this line almost doesn't slow my program.

I think it's because this line used to run on the GPU but the new dataset API forces it to run on the CPU which is way slower and already 100% used.

Do you have any idea on how can I apply gaussian blur with out decreasing so much the performance? Should I keep my old queue based pipeline?

arretdes
  • 21
  • 4
  • Can you move the operation to your model? Right now it appears you're doing it in when you're generating data, which is meant to happen on the cpu so that it can be done in parallel with the training. – matt Apr 16 '19 at 06:58
  • Thank you! It's the only way I found to compute it on my GPU. Unfortunately, it's really not convenient and I eventually gave up applying gaussian blur. – arretdes Apr 24 '19 at 01:47

1 Answers1

0

Although I have not tried this on a tf dataset, this should be applicable. I found this combination to be highly performant and simplistic:

import tensorflow as tf
import tensorflow_addons as tfa

dummy_dataset = tf.ones((1000, 224, 224, 3))
blurred_dummy_dataset = tf.map_fn(tfa.image.gaussian_filter2d, dummy_dataset)
# ~ 3 second runtime in Google Colab on a Tesla K80

https://www.tensorflow.org/addons/api_docs/python/tfa/image/gaussian_filter2d https://www.tensorflow.org/api_docs/python/tf/map_fn

Ryan Walden
  • 167
  • 10