Check if path to file exists using Tensorflow's tf.io.gfile.exists

Question

In my Tensorflow pipeline, I defined a load() function which is required to verify, if a specific image file exists under the given path. Its looking somewhat like this:

import tensorflow as tf

def load(image_file):

  if tf.io.gfile.exists(image_file):
    input_image = tf.io.read_file(image_file)
    # do things with input_image

  return input_image

This on its own works without problems. The error arises when I wrap this function later in setting up the dataset:

train_dataset = tf.data.Dataset.list_files(IMAGE_PATH)
train_dataset = train_dataset.map(load,
                              num_parallel_calls=tf.data.experimental.AUTOTUNE)


#...
TypeError: in converted code:

<ipython-input-22-bdfc518ba578>:13 load  *
    if tf.io.gfile.exists(image_file):
/home/bdavid/.conda/envs/DL_env/lib/python3.7/site-packages/tensorflow_core/python/lib/io/file_io.py:280 file_exists_v2
    pywrap_tensorflow.FileExists(compat.as_bytes(path))
/home/bdavid/.conda/envs/DL_env/lib/python3.7/site-packages/tensorflow_core/python/util/compat.py:87 as_bytes
    (bytes_or_text,))

TypeError: Expected binary or unicode string, got <tf.Tensor 'args_0:0' shape=() dtype=string>

The problem seems to be the evaluation of image_file in EagerMode as tf.io.gfile.exists demands a string as input, not a Tensor of type string.

I've tried already returning the string value using image_file.numpy() resulting in AttributeError: 'Tensor' object has no attribute 'numpy'.

I also tried wrapping my function in a tf.py_function() as suggested in this closely related question, which results in the exact same TypeError during execution. Using os.path.exists instead of tf.io.gfile.exists shoots of course the same error as well.

Any suggestion on a work-around or proper way of dealing with this would be highly appreciated!

score 2 · Answer 1 · answered Mar 20 '20 at 06:01

I have created a workaround this, using map_fn and matching_files that I executed without any errors.

I think you could try to implement this approach on your code.

def load(image_file):
  if tf.io.gfile.exists(image_file.numpy()):
    input_image = tf.io.read_file(image_file)

  return input_image

IMAGE_PATH = '/content/images'
# train_dataset = tf.data.Dataset.list_files(IMAGE_PATH)
tf_matching = tf.io.matching_files('/content/images/*.png')
# train_dataset = train_dataset.map(load, num_parallel_calls=tf.data.experimental.AUTOTUNE)
train_dataset = tf.map_fn(load, tf_matching)

I have also included the commented out code for your comparison.

You could read more about these functions I used in these links.
Reference for TensorFlow Map Function in this link.
Reference for TensorFlow Matching Files in this link.

One has to add the output type for tf.map_fn if its different than the input type (in this case the output is an image, for me a `tf.float32`. While this overcomes the error I get by evaluating the `tf.io.gfile.exists()` input in EagerMode, the solution actually crashed my PC, as it applies the function directly for all images, instead of creating the computational graph and only evaluating it when needed. Therefore it tries fitting all images into memory, which I guess is a bad idea in general. Any idea how else to deal with this? How do you make it work in your pipeline? — Bastian David, Mar 23 '20 at 14:51

Check if path to file exists using Tensorflow's tf.io.gfile.exists

1 Answers1