3

I am acutally working on a mini-project based on cifar10 dataset. I have loaded the data from tfds.load(...) and practicing image augmentation techniques.

As I am using tf.data.Dataset object, which is my dataset, real-time data augmentation is quite unachievable, hence I want to pass all the features into tf.keras.preprocessing.image.ImageDataGenerator.flow(...) to gain the functionality of real-time augmentation.

But this flow(...) method accepts NumPy arrays which in no way related to tf.data.Dataset object.

Can somebody guide me in this regard (or any alternative) and how do I proceed further?

Are tf.image transformations real-time? If not, what can be the best aproach other than ImageDataGenerator.flow(...)?

My code:

import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.preprocessing.image import ImageDataGenerator

splitting = tfds.Split.ALL.subsplit(weighted=(70, 20, 10))
dataset_cifar10, dataset_info = tfds.load(name='cifar10', 
                                          split=splitting, 
                                          as_supervised=True, 
                                          with_info=True)

train_dataset, valid_dataset, test_dataset = dataset_cifar10

BATCH_SIZE = 32

train_dataset = train_dataset.batch(batch_size=BATCH_SIZE)
train_dataset = train_dataset.prefetch(buffer_size=1)

image_generator = ImageDataGenerator(rotation_range=45, 
                                     width_shift_range=0.15, 
                                     height_shift_range=0.15, 
                                     zoom_range=0.2, 
                                     horizontal_flip=True, 
                                     vertical_flip=True, 
                                     rescale=1./255)

train_dataset_generator = image_generator.flow(...)

...
acesaif
  • 192
  • 1
  • 3
  • 16

2 Answers2

2

Right after splitting train and test dataset you can iterate over the dataset and append in a list which you can use with ImageDataGenerator. A complete usecase bellow:

cifar10_data, cifar10_info = tfds.load("cifar10", with_info=True, as_supervised=True)
train_data, test_data = cifar10_data['train'], cifar10_data['test']
NUM_CLASSES = 10

train_x = []
train_y = []
for sample in train_data:
    train_x.append(sample[0].numpy())
    train_y.append(tf.keras.utils.to_categorical(sample[1].numpy(), num_classes=NUM_CLASSES))

train_x = np.asarray(train_x)
train_y = np.asarray(train_y)

# DataGenerator
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    horizontal_flip=True)

# Fitting train_x data
datagen.fit(train_x)

# Testing
EPOCHS = 1
BATCH_SIZE = 16
for e in range(EPOCHS):
    for batch_x, batch_y in datagen.flow(train_x, train_y, batch_size=BATCH_SIZE):
        print(batch_x, batch_y)
        # Manually needs to break loop
Kaushik Roy
  • 1,627
  • 2
  • 11
  • 13
  • What about the last question which has two sub-parts in it? – acesaif Nov 29 '19 at 12:57
  • As per official [documentation](https://keras.io/preprocessing/image/) ImageDataGenerator generates batches of tensor image data with real-time data augmentation. – Kaushik Roy Nov 29 '19 at 13:07
0
import tensorflow as tf
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
from tensorflow.keras.preprocessing.image import ImageDataGenerator

splits = ['train[:70%]', 'train[70%:90%]', 'train[90%:]']
BATCH_SIZE = 64
dataset_cifar10, dataset_info = tfds.load(name='cifar10', 
                                          split=splits, 
                                          as_supervised=True, 
                                          with_info=True,
                                          batch_size=BATCH_SIZE)

train_dataset, valid_dataset, test_dataset = dataset_cifar10

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    rotation_range=45, 
    width_shift_range=0.15, 
    height_shift_range=0.15, 
    zoom_range=0.2, 
    horizontal_flip=True, 
    vertical_flip=True, 
    rescale=1./255)

# custom function to wrap image data generator with raw dataset
def tfds_imgen(ds, imgen, batch_size, num_batches):
    for images, labels in ds.batch(batch_size=batch_size).prefetch(buffer_size=1):
        flow = imgen.flow(images, labels, batch_size=batch_size)
        for _ in range(num_batches):
            yield next(flow)
# call the custom function to get the augmented data generator
train_dataset_generator = tfds_imgen(
    train_dataset.as_numpy_iterator(), 
    image_generator,
    batch_size=32,
    num_batches=BATCH_SIZE // 32
)       
Li-Pin Juan
  • 1,156
  • 1
  • 13
  • 22