I have a custom file containing the paths to all my images and their labels which I load in a dataframe using:
MyIndex=pd.read_table('./MySet.txt')
MyIndex has two columns of interest ImagePath and ClassName
Next I do some train test split and encoding the output labels as:
images=[]
for index, row in MyIndex.iterrows():
img_path=basePath+row['ImageName']
img = image.load_img(img_path, target_size=(299, 299))
img_path=None
img_data = image.img_to_array(img)
img=None
images.append(img_data)
img_data=None
images[0].shape
Classes=Sample['ClassName']
OutputClasses=Classes.unique().tolist()
labels=Sample['ClassName']
images=np.array(images, dtype="float") / 255.0
(trainX, testX, trainY, testY) = train_test_split(images,labels, test_size=0.10, random_state=42)
trainX, valX, trainY, valY = train_test_split(trainX, trainY, test_size=0.10, random_state=41)
images=None
labels=None
encoder = LabelEncoder()
encoder=encoder.fit(OutputClasses)
encoded_Y = encoder.transform(trainY)
# convert integers to dummy variables (i.e. one hot encoded)
trainY = to_categorical(encoded_Y, num_classes=len(OutputClasses))
encoded_Y = encoder.transform(valY)
# convert integers to dummy variables (i.e. one hot encoded)
valY = to_categorical(encoded_Y, num_classes=len(OutputClasses))
encoded_Y = encoder.transform(testY)
# convert integers to dummy variables (i.e. one hot encoded)
testY = to_categorical(encoded_Y, num_classes=len(OutputClasses))
datagen=ImageDataGenerator(rotation_range=90,horizontal_flip=True,vertical_flip=True,width_shift_range=0.25,height_shift_range=0.25)
datagen.fit(trainX,augment=True)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
batch_size=128
model.fit_generator(datagen.flow(trainX,trainY,batch_size=batch_size), epochs=500,
steps_per_epoch=trainX.shape[0]//batch_size,validation_data=(valX,valY))
The problem I face that the data loaded in one go is too large to fit in current machine memory and so I am unable to work with the complete dataset.
I have tried to work with the datagenerator but do not want to follow he directory conventions it follows and also cannot eradicate the augmentation part.
The question is that is there a way to load batches from the disk ensuring the two stated conditions.