10

How do I create a tf.data.Dataset from tf.keras.preprocessing.image.ImageDataGenerator.flow_from_directory?

I'm considering tf.data.Dataset.from_generator, but it's unclear how to acquire the output_types keyword argument for it, given the return type:

A DirectoryIterator yielding tuples of (x, y) where x is a numpy array containing a batch of images with shape (batch_size, *target_size, channels) and y is a numpy array of corresponding labels.

A T
  • 13,008
  • 21
  • 97
  • 158

2 Answers2

10

Both batch_x and batch_y in ImageDataGenerator are of type K.floatx(), so must be tf.float32 by default.

Similar question was discussed already at How to use Keras generator with tf.data API. Let me copy-paste the answer from there:

def make_generator():
    train_datagen = ImageDataGenerator(rescale=1. / 255)
    train_generator = 
    train_datagen.flow_from_directory(train_dataset_folder,target_size=(224, 224), class_mode='categorical', batch_size=32)
    return train_generator

train_dataset = tf.data.Dataset.from_generator(make_generator,(tf.float32, tf.float32))

The author faced another issue with the graph scope, but I guess it is unrelated to your question.

Or as a one liner:

tf.data.Dataset.from_generator(lambda:
    ImageDataGenerator().flow_from_directory('folder_path'),(tf.float32, tf.float32))
A T
  • 13,008
  • 21
  • 97
  • 158
Dmytro Prylipko
  • 4,762
  • 2
  • 25
  • 44
  • 1
    Thanks, but this is giving me an error: 'TypeError: `generator` must be callable.'. `tf.data.Dataset.from_generator(ImageDataGenerator().flow_from_directory('folder_path'), (tf.float32, tf.float32))` – A T Feb 10 '19 at 00:19
  • Try `train_dataset = tf.data.Dataset.from_generator(make_generator(), ...` – Dmytro Prylipko Feb 10 '19 at 09:08
  • I did not quite understand how to fill `train_generator =` part. can you explain a bit more? – ARAT May 29 '19 at 02:45
  • @ Dmytro Prylipko In my case it does manage to transform it to a Dataset type, but the dimension are . Yet i've use exactly the same structure as above. – Yoan B. M.Sc Aug 05 '20 at 18:57
7

Here is my solution. To show how it works, I use cats/dogs datasets:

import matplotlib.pyplot as plt
import numpy as np
import os
import tensorflow as tf


_URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=_URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')

train_dir = os.path.join(PATH, 'train')
#'/Users/mustafamuratarat/.keras/datasets/cats_and_dogs_filtered/train'

BATCH_SIZE = 32
IMG_SIZE = (160, 160)

img_gen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

gen = img_gen.flow_from_directory(train_dir, target_size=(160, 160), batch_size=32)
#<tensorflow.python.keras.preprocessing.image.DirectoryIterator at 0x7fb9fde3b250>

#gen.class_indices
#{'cats': 0, 'dogs': 1}

#gen.target_size
#(160, 160)

# gen.batch_size
# 32

# gen.num_classes
# 2

dataset = tf.data.Dataset.from_generator(
    lambda: gen,
    output_types = (tf.float32, tf.float32),
    output_shapes = ([None, 160, 160, 3], [None, 2]),
)

#list(dataset.take(1).as_numpy_iterator())

Then you can feed dataset object to any model.

ARAT
  • 884
  • 1
  • 14
  • 35
  • Good to see `output_shapes` and `output_types` set on `from_generator`… might work better as a comment or edit of the other answer though, with commentary on the relative advantage – A T Mar 17 '21 at 08:20