1

The model takes four inputs and gives one output. Among those four inputs two is numerical data, one is categorical and another one is image. The output is binary (0 or 1). I need to create a custom data generator which can take those inputs from the dataframe and feed those into the model.

I feed the images into CNN model. The image dataset is too large to feed into the model without using a data generator.

How can I feed those images into the model by batches ? It will be very helpful if I can learn how to create custom data generators according to any specific model.

Thank You.

  • there is a good tutorial on this at https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly – Gerry P Nov 03 '22 at 21:45

1 Answers1

0

you might not need to use tf.keras.utils.Sequence. I think you can go about it using ImageDataGenerator.flow_from_dataframe. Lets assume you have a dataframe called df with the following columns:

column 0 is the filepaths column that contains the full path to the image file
column 1 first numerical data column let it have column name num1
column 2 2nd numerical data column let it have column name num2
column 3 is the categorical data column, give it the column name cat

ok now create a list of the form

input_list=[num1, num2, cat]

now create the generators

bs=30 # batch_size
img_size=(224,224) # image size to use
gen=ImageDataGenerator(rescale=1/255)
train_gen=gen.flow_from_dataframe(df, xcol='filepaths', y_col=input_list, target_size=img_size, batch_size=bs, shuffle=True, seed=123, class_mode='raw', color_mode='rgb')

Note make sure class_mode is set to 'raw'. To test the generator try this code

images, labels=next(train_gen)
print (images.shape) # should get (30, 224,224,3)
print (labels.shape) # should get (30, 3)

I have used this approach where all the input columns in the input_list were numeric and was able to train a model. I am not sure if this will work for a mmixture of numeric and categorical inputs but I think it will. Note of course you may first want to partition df into a train_df, a test_df and a valid_df using sklearn's train_test_split. In that case you will want to make a train, test and valid generator. In the test generator set shuffle=False. Let me know if this works.

Gerry P
  • 7,662
  • 3
  • 10
  • 20
  • Thanks for your answer. I am trying out this approach. But I got KeyError: 'filename' error. The name of the xcol and y_col are not wrong. Those columns are present in my dataset. – Abdullah Al Munem Nov 04 '22 at 09:21
  • I read this tutorial of stanford that you mentioned and some others as well. I code the function using tf.keras.utils.Sequence but I got an error while fit the model. The error is " ValueError: Dimension 0 in both shapes must be equal, but are 2 and 1. Shapes are [2] and [1]. for '{{node AssignAddVariableOp_8}} = AssignAddVariableOp[dtype=DT_FLOAT](AssignAddVariableOp_8/resource, Sum_7)' with input shapes: [], [1]. " – Abdullah Al Munem Nov 04 '22 at 11:05