If you're using TensorFlow 2, then there are two approaches you could try:
- Using
.flow_from_directory()
: as the docs say, you can actually just pass in the path to the directory holding your images, and then your datagen
object is ready to be passed to model.fit()
. Here is the example that's provided in the TensorFlow documentation I linked above (with some additional comments for clarity):
# Set the augmentations the data generators will do
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
# Instantiate a DirectoryIterator - this yields the batches of data samples + their labels
train_generator = train_datagen.flow_from_directory(
'data/train',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
'data/validation',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
# Train a Sequential model
model.fit(
train_generator,
steps_per_epoch=2000,
epochs=50,
validation_data=validation_generator,
validation_steps=800)
- Using
tf.data.Dataset.from_generator
: this approach might be more convenient for you if you want to take advantage of the tf.data
API, and your dataset hasn't already been split into training and test sets. Here's an example (from a different page in the docs) of how it works:
# This example uses an image dataset that has NOT been split into train/test yet
flowers = tf.keras.utils.get_file(
'flower_photos',
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
untar=True)
# Like before, set the data augmentations
img_gen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255, rotation_range=20)
# (Optional) Double-check the dimensions of a single batch
images, labels = next(img_gen.flow_from_directory(flowers))
print(images.dtype, images.shape) # float32 (32, 256, 256, 3)
print(labels.dtype, labels.shape) # float32 (32, 5)
# Now, you can make a dataset with the augmentations
ds = tf.data.Dataset.from_generator(
lambda: img_gen.flow_from_directory(flowers),
output_types=(tf.float32, tf.float32),
output_shapes=([32,256,256,3], [32,5])
)
Of course, you might still be wondering, "how are we going to split this ds
variable into training and test sets?"
Fortunately, Angel Igareta has written a great blog post on this topic. Below I will include just the code snippet that solves our problem:
def get_dataset_partitions_tf(ds, ds_size, train_split=0.8, val_split=0.1, test_split=0.1, shuffle=True, shuffle_size=10000):
"""Credit to Angel Igareta at https://towardsdatascience.com/how-to-split-a-tensorflow-dataset-into-train-validation-and-test-sets-526c8dd29438 for this code."""
assert (train_split + test_split + val_split) == 1
if shuffle:
# Specify seed to always have the same split distribution between runs
ds = ds.shuffle(shuffle_size, seed=12)
train_size = int(train_split * ds_size)
val_size = int(val_split * ds_size)
train_ds = ds.take(train_size)
val_ds = ds.skip(train_size).take(val_size)
test_ds = ds.skip(train_size).skip(val_size)
return train_ds, val_ds, test_ds
In this way, you will be able to pass your dataset to model.fit()
, and TensorFlow will essentially do the data augmentations for you as it trains.
Last but not least - in your case, I believe you'll want to pass featurewise_std_normalization=True
to the ImageDataGenerator
constructor. Let me know if I missed something in your question, but I don't think there actually is a parameter named feature_std_normalization
for it.