I am trying to build an image classification program using AutoKeras, Tensorflow, and Pandas.
The code is as folllows:
from keras_preprocessing.image import ImageDataGenerator
import autokeras as ak
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
# directory with subfolders (that contain other subfolders) that contain images
data_dir = "/home/jack/project/"
# dataframe initialization
dataframe = pd.read_excel("/home/jack/project/pathsandlabels.xlsx")
# splitting the dataset
train_dataframe = dataframe.sample(frac=0.75, random_state=200)
test_dataframe = dataframe.drop(train_dataframe.index)
# Augmenting it
datagen = ImageDataGenerator(rescale=1./255., horizontal_flip=True, shear_range=0.6, zoom_range=0.4,
validation_split=0.25)
# Setting up a train generator
train_generator = datagen.flow_from_dataframe(
dataframe=train_dataframe,
directory="/home/jack/project",
x_col="filename",
y_col="assessment",
subset="training",
seed=42,
batch_size=16,
shuffle=True,
class_mode="binary",
target_size=(224, 224)
)
# setting up a validation generator
validation_generator = datagen.flow_from_dataframe(
dataframe=train_dataframe,
directory="/home/jack/project/",
x_col="filename",
y_col="assessment",
subset="validation",
batch_size=16,
seed=42,
shuffle=True,
class_mode="binary",
target_size=(224, 224)
)
# Another augmentation but for test data
test_gen = ImageDataGenerator(rescale=1./255.)
# test generator set up
test_generator = test_gen.flow_from_dataframe(
dataframe=test_dataframe,
directory="/home/jack/project/",
x_col="filename",
y_col=None,
batch_size=16,
seed=42,
shuffle=False,
class_mode=None,
target_size=(224, 224)
)
# this function will yield the variables we need to work with in order to create a train and test set
# it will iterate through the generator
def my_iterator(generator):
for img_batch, targets_batch in generator:
yield test_generator.batch_size, targets_batch
# Train and Validation set creation
# The first problem is here
# 1: Invalid argument: Value Error: 'generator' yielded an element of shape (16,224,224,3) where an element
# of shape (224,) was expected.
train_set = tf.data.Dataset.from_generator(lambda: my_iterator(train_generator), output_shapes=(224, 244),
output_types=(tf.float32, tf.float32))
val_set = tf.data.Dataset.from_generator(lambda: my_iterator(validation_generator), output_shapes=(224, 224),
output_types=(tf.float32, tf.float32))
# we check the output of both validation and train sets
print(train_set)
print(val_set)
# This piece of code is where the other two issues are:
# 2: squeeze(axis=2) gives this error: ValueError: cannot select an axis to squeeze out which has size not equal to one
# 3: Issue 2 can be averted by setting axis=None, but the next problem is plt.show() gives an empty image.
for image, label in train_set.take(1):
print("Image shape: ", image.numpy.shape())
print("Label: ", label.numpy.shape())
plt.imshow(image.numpy()[0].squeeze(axis=2) * 255)
plt.show()
clf = ak.ImageClassifier(overwrite=True, max_trials=1, seed=5)
clf.fit(x=train_set, epochs=20)
print(clf.evaluate(val_set))
I mentioned the issues I face as comments in the code, but I will explain again.
The biggest issue is the first one:Value Error: 'generator' yielded an element of shape (16,224,224,3) where an element of shape (224,) was expected. This happens when I try to initialize my test set.
What I tried:
- Changing output_shape to (224,224,3) and (16,224,224,3) (didn't help, threw a different error saying that "The two sequences do not have the same length"
- Deleting batch_size from train_generator (this set it back to the default 32 which my pc can't handle)
- Changing target_size within the generators to (224,224,3) and (16,224,224,3). didn't work
- Changing the number of variables that my_iterator yields. Didn't work (error message: expect n (this is either 3 or 4) values to unpack, got 2)
- Changing batch_size to a number by which the total number of images can be divided by (didn't work, throws original error message)
How the data is stored: Excel. Single sheet. Two columns, A and B. filename and assessment being the column names. Filename is paths to the images (e.g "/subfolder/subfolder/subfolder/A2c3jc3291n.jpeg") but without the quotes obviously. Assessments are the classes. There are only two in this case.