2

I'm trying to create a TensorFlow Dataset from multichannel tiff files. The tfio.experimental.image.decode_tiff(image) from TensorFlow I/O works for 4 channels only so I tried to read it first into numpy using in this case rasterio and then convert it to TF like this:

import tensorflow as tf
import rasterio as rio

@tf.compat.v2.autograph.experimental.do_not_convert
def parse_image(img_path: str) -> dict:
    
    src = rio.open(img_path)
    image_numpy = src.read()
    image = tf.convert_to_tensor(image_numpy, dtype=tf.float32)

    return {'image': image}

train_dataset = tf.data.Dataset.list_files("multipage_tiff_example.tif", seed=50)
train_dataset = train_dataset.map(parse_image)

Unfortunately, I can't get over this error:

TypeError                                 Traceback (most recent call last)
<ipython-input-77-8308896524b1> in <module>
     11 
     12 train_dataset = tf.data.Dataset.list_files("multipage_tiff_example.tif", seed=50)
---> 13 train_dataset = train_dataset.map(parse_image)

[...]

~/opt/anaconda3/envs/proj1/lib/python3.7/site-packages/rasterio/__init__.py in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
    156     if not isinstance(fp, string_types):
    157         if not (hasattr(fp, 'read') or hasattr(fp, 'write') or isinstance(fp, Path)):
--> 158             raise TypeError("invalid path or file: {0!r}".format(fp))
    159     if mode and not isinstance(mode, string_types):
    160         raise TypeError("invalid mode: {0!r}".format(mode))

TypeError: invalid path or file: <tf.Tensor 'args_0:0' shape=() dtype=string>

The exemplary tif file is here: http://www.nightprogrammer.org/wp-uploads/2013/02/multipage_tiff_example.tif

tf.__version__: 2.5.0-rc3
rio.__version__: 1.1.8

I did lots of cross-checks. For example:

  • Loading a file via rasterio to numpy works
  • parse() function with standard jpg and tf.io inside works:
    image = tf.io.read_file(img_path)
    image = tf.image.decode_jpeg(image, channels=3)
  • same error occurs when I pass to Dataset.list_files() the whole dir with files using proper dir+files patterns

The problem "invalid path or file" is raised by rasterio but I feel that it might be related/caused by TF mechanisms.

Could You please advise how to make it work?

Lukiz
  • 175
  • 1
  • 9
  • @ Lukiz I am facing same error in reading multi-band images. Would you please share how you solved this problem? – rayan Nov 30 '21 at 12:08
  • @rayan Unfortunately I didn't. I have put this project on-hold since then. – Lukiz Nov 30 '21 at 17:16

1 Answers1

1

This is an old question, but it show up when I faced the same problem. In my case I'm trying to work with images with 6 channels (256, 256, 6).

I figured out how to solve this problem in tensorflow 1.14, but I believe that it can be used in tensorflow v2 (I don't have this version in my machine).

You can use the tf.py_function to wrap the parse_image function:

# TF 1.14 needs to enable eager execution (default in TF 2)
tf.enable_eager_execution()

# df_img_train['images_path'] holds the images path

ds = tf.data.Dataset.from_tensor_slices(df_img_train['images_path'])
ds = ds.map(lambda x: tf.py_function(parse_image, [x], [tf.float32])).batch(8)

Then you can adjust the parse_image function to read the string in the Tensor:

def parse_image(img_path: str):
    # Cast the Tensor to numpy and decode the string
    img_path = img_path.numpy().decode('utf-8')

    with rasterio.open(img_path) as src:
        img = src.read()
        # Channels last
        img = np.moveaxis(img, 0, 2)

    return img

Note that this implementation returns an numpy array that will be casted to tf.float32 (the third parameter in tf.py_function). Now its possible to iterate over your dataset:

for img in ds:
    print(img[0].shape)

# Output: (8, 256, 256, 6)

I noticed that you want to return a dictionary, but this code may serves you as well.