0

The example below is extracted from the official TensorFlow tutorial on data pipelines. Basically, one resizes a bunch of JPGs to be (128, 128, 3). For some reason, when applying the map() operation, the colour dimension, namely 3, is turned into a None when examining the shape of the dataset. Why is that third dimension singled out? (I checked to see if there were any images that weren't (128, 128, 3) but didn't fid any.)

If anything, None should only show up for the very first dimension, i.e., that which counts the number of examples, and should not affect the individual dimensions of the examples, since---as nested structures---they're supposed to have the same shape anyway so as to be stored as tf.data.Datasets.

The code in TensorFlow 2.1 is

import pathlib
import tensorflow as tf

# Download the files.
flowers_root = tf.keras.utils.get_file(
    'flower_photos',
    'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
    untar=True)
flowers_root = pathlib.Path(flowers_root)

# Compile the list of files.
list_ds = tf.data.Dataset.list_files(str(flowers_root/'*/*'))

# Reshape the images.
# Reads an image from a file, decodes it into a dense tensor, and resizes it
# to a fixed shape.
def parse_image(filename):
  parts = tf.strings.split(file_path, '\\') # Use the forward slash on Linux
  label = parts[-2]

  image = tf.io.read_file(filename)
  image = tf.image.decode_jpeg(image)
  image = tf.image.convert_image_dtype(image, tf.float32)
  image = tf.image.resize(image, [128, 128])
  print("Image shape:", image.shape)
  return image, label

print("Map the parse_image() on the first image only:")
file_path = next(iter(list_ds))
image, label = parse_image(file_path)

print("Map the parse_image() on the whole dataset:")
images_ds = list_ds.map(parse_image)

and yields

Map the parse_image() on the first image only:
Image shape: (128, 128, 3)
Map the parse_image() on the whole dataset:
Image shape: (128, 128, None)

Why None in that last line?

Tfovid
  • 761
  • 2
  • 8
  • 24

1 Answers1

0

From the tutorial you are missing this part

for image, label in images_ds.take(5):
    show(image, label)

The line

images_ds = list_ds.map(parse_image)

only creates a placeholder and there is no image being passed to the function if you put prints the file_path is blank But if your use

for image, label in images_ds.take(5)

it iterates over each image passing it through the parse_image function.

chrisgiffy
  • 140
  • 14
  • For some reason, I thought that `.map()` would execute eagerly as that would presumably be the default behavior with TensorFlow 2.0. It seems, however, that's not the case, and the "eagerness" is only triggered by `.take()`? – Tfovid Apr 28 '20 at 18:15
  • the map function creates a graph of the function parse Image. when you run it through map tensorflow executes the graph. – chrisgiffy Apr 28 '20 at 19:53