0

I am building a basic Image Classification Project. However, my data set is a dictionary of labels as keys and respective images as values. {'label_name1': ['imagepath1', 'imagepath2',....], 'label_name2': ['imagepath1', 'image2path',....],....}

How can I preprocess this kind of data set and later on use it in a Sequential Classification Model.

Go For Pro
  • 115
  • 1
  • 8

1 Answers1

0

You can separate the dictionary into 2 parallel lists with some for loops.

data = {
    'label_name1': ['path1', 'path2'],
    'label_name2': ['path3', 'path4']
}
train_images, train_labels = [], []

for label in data:
    for image in data[label]:
        train_images.append(image)
        train_labels.append(label)

print(train_images) # ['path1', 'path2', 'path3', 'path4']
print(train_labels) # ['label_name1', 'label_name1', 'label_name2', 'label_name2']

Bonus: You can then shuffle the images and labels in parallel by zipping the lists together.

from random import shuffle
temp = list(zip(train_images, train_labels))
shuffle(temp)
train_images, train_labels = [list(i) for i in zip(*temp)]

print(train_images) # ['path3', 'path2', 'path1', 'path4']
print(train_labels) # ['label_name2', 'label_name1', 'label_name1', 'label_name2']