0

I am trying to build an image classification program using AutoKeras, Tensorflow, and Pandas.

The code is as folllows:

from keras_preprocessing.image import ImageDataGenerator
import autokeras as ak
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

# directory with subfolders (that contain other subfolders) that contain images
data_dir = "/home/jack/project/"

# dataframe initialization
dataframe = pd.read_excel("/home/jack/project/pathsandlabels.xlsx")

# splitting the dataset
train_dataframe = dataframe.sample(frac=0.75, random_state=200)
test_dataframe = dataframe.drop(train_dataframe.index)

# Augmenting it
datagen = ImageDataGenerator(rescale=1./255., horizontal_flip=True, shear_range=0.6, zoom_range=0.4,
                         validation_split=0.25)

# Setting up a train generator
train_generator = datagen.flow_from_dataframe(
dataframe=train_dataframe,
directory="/home/jack/project",
x_col="filename",
y_col="assessment",
subset="training",
seed=42,
batch_size=16,
shuffle=True,
class_mode="binary",
target_size=(224, 224)
)


# setting up a validation generator
validation_generator = datagen.flow_from_dataframe(
dataframe=train_dataframe,
directory="/home/jack/project/",
x_col="filename",
y_col="assessment",
subset="validation",
batch_size=16,
seed=42,
shuffle=True,
class_mode="binary",
target_size=(224, 224)
)

# Another augmentation but for test data
test_gen = ImageDataGenerator(rescale=1./255.)

# test generator set up
test_generator = test_gen.flow_from_dataframe(
dataframe=test_dataframe,
directory="/home/jack/project/",
x_col="filename",
y_col=None,
batch_size=16,
seed=42,
shuffle=False,
class_mode=None,
target_size=(224, 224)
)


# this function will yield the variables we need to work with in order to create a train and test set
# it will iterate through the generator
def my_iterator(generator):
    for img_batch, targets_batch in generator:
        yield test_generator.batch_size, targets_batch


# Train and Validation set creation
# The first problem is here
# 1: Invalid argument: Value Error: 'generator' yielded an element of shape (16,224,224,3) where an element
# of shape (224,) was expected.
train_set = tf.data.Dataset.from_generator(lambda: my_iterator(train_generator), output_shapes=(224, 244),
                                       output_types=(tf.float32, tf.float32))

val_set = tf.data.Dataset.from_generator(lambda: my_iterator(validation_generator), output_shapes=(224, 224),
                                     output_types=(tf.float32, tf.float32))

# we check the output of both validation and train sets
print(train_set)
print(val_set)

# This piece of code is where the other two issues are:
# 2: squeeze(axis=2) gives this error: ValueError: cannot select an axis to squeeze out which has size not equal to one 
# 3: Issue 2 can be averted by setting axis=None, but the next problem is plt.show() gives an empty image. 
for image, label in train_set.take(1):
    print("Image shape: ", image.numpy.shape())
    print("Label: ", label.numpy.shape())
    plt.imshow(image.numpy()[0].squeeze(axis=2) * 255)
    plt.show()

clf = ak.ImageClassifier(overwrite=True, max_trials=1, seed=5)
clf.fit(x=train_set, epochs=20)
print(clf.evaluate(val_set))

I mentioned the issues I face as comments in the code, but I will explain again.

The biggest issue is the first one:Value Error: 'generator' yielded an element of shape (16,224,224,3) where an element of shape (224,) was expected. This happens when I try to initialize my test set.

What I tried:

  1. Changing output_shape to (224,224,3) and (16,224,224,3) (didn't help, threw a different error saying that "The two sequences do not have the same length"
  2. Deleting batch_size from train_generator (this set it back to the default 32 which my pc can't handle)
  3. Changing target_size within the generators to (224,224,3) and (16,224,224,3). didn't work
  4. Changing the number of variables that my_iterator yields. Didn't work (error message: expect n (this is either 3 or 4) values to unpack, got 2)
  5. Changing batch_size to a number by which the total number of images can be divided by (didn't work, throws original error message)

How the data is stored: Excel. Single sheet. Two columns, A and B. filename and assessment being the column names. Filename is paths to the images (e.g "/subfolder/subfolder/subfolder/A2c3jc3291n.jpeg") but without the quotes obviously. Assessments are the classes. There are only two in this case.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Victor Simon
  • 75
  • 3
  • 10
  • This is a ridiculously complicated way of using a directory iterator. You could do it in 4-5 lines like [this](https://stackoverflow.com/questions/64359945/how-can-i-explore-and-modify-the-created-dataset-from-tf-keras-preprocessing-ima/64372101#64372101). With small edits you could make it get filenames from a dataframe. – Nicolas Gervais Nov 18 '20 at 16:15
  • okay, I will give this a shot. thank you. – Victor Simon Nov 18 '20 at 20:31

0 Answers0