I am trying to use TensorFlow Extended for building a pipeline for my image classification model. I am reading and transforming images from local directory with following code:
train_datagen = ImageDataGenerator(rescale=1.0/255.,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=.2,
zoom_range=.2,
horizontal_flip=True,
fill_mode='nearest')
train_generator = train_datagen.flow_from_directory(directory=train_data_path,
batch_size=32,
class_mode='categorical',
target_size=(150, 150))
validation_datagen = ImageDataGenerator(rescale=1.0/255.,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
validation_generator = validation_datagen.flow_from_directory(directory=test_data_path,
batch_size=32,
class_mode='categorical',
target_size=(150, 150))
Directory with data looks like this:
.
└── Data
├── test
├── train
│ ├── buildings
│ ├── forest
│ ├── glacier
│ ├── mountain
│ ├── sea
│ └── street
└── validation
├── buildings
├── forest
├── glacier
├── mountain
├── sea
└── street
Now, almost all TensorFlow Extended tutorial or documentation provides example to read and transform data from a CSV file using CsvExampleGen
as following:
_data_root = tempfile.mkdtemp(prefix='tfx-data')
DATA_PATH = 'https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/chicago_taxi_pipeline/data/simple/data.csv'
_data_filepath = os.path.join(_data_root, "data.csv")
urllib.request.urlretrieve(DATA_PATH, _data_filepath)
context = InteractiveContext()
example_gen = tfx.components.CsvExampleGen(input_base=_data_root)
context.run(example_gen, enable_cache=True)
I could not find a proper way to make a pipeline for reading and transforming image dataset from folder. Does anyone has any better solution/tutorial/documentation regarding this issue?