I am working on a dataset named PlantVillage dataset for plant disease classification task, the output is multiclass and the example naming of class is 'tomato___healthy'. I want to split the class and make one data having two class (in above example, it will have tomato as one class and healthy as another class) to do multitasking. Below is what I am trying.
First, I defined the batchsize and using dataset_from_directory to fetch the data and autolabel.
BATCH_SIZE = 32
IMG_SIZE = (255, 255)
data_dir = "/content/plantvillage dataset/color"
train_dataset = image_dataset_from_directory(data_dir,
shuffle=True,
label_mode = 'categorical',
validation_split = 0.2,
batch_size=BATCH_SIZE,
seed = 42,
subset = "training",
image_size=IMG_SIZE)
validation_dataset = image_dataset_from_directory(data_dir,
shuffle=True,
label_mode = 'categorical',
validation_split = 0.2,
batch_size=BATCH_SIZE,
seed = 42,
subset = "validation",
image_size=IMG_SIZE)
Second, I tried to use the code below to fetch the data and also the labels:
y = np.concatenate([y for x, y in validation_dataset], axis=0)
x = np.concatenate([x for x, y in validation_dataset], axis=0)
Last, I want to use this to generate two class for one data:
def generate_data(x, y, batch_size=32):
num_examples = len(y)
while True:
x_batch = np.zeros((batch_size, 255, 255, 3))
y_batch = np.zeros((batch_size,))
c_batch = np.zeros((batch_size,))
for i in range(0, batch_size):
index = np.random.randint(0, num_examples)
image, specie, disease = x[index], y[index].split('___')[0], y[index].split('___')[1]
x_batch[i] = image
y_batch[i] = specie
c_batch[i] = disease
yield x_batch, [y_batch, c_batch]
My files structure is as follows:
color/
-Tomato___healthy/
- iweoqwd.jpg
- weqwjeh.jpg
-Tomato___Tomato_Yellow_Leaf_Curl_Virus/
- iweoqwd.jpg
- weqwjeh.jpg
I am stuck on the second step which is because of the memory crush issue. The method cant seem to handle well with the memory and may I ask how can I overcome it and is there any other easier way to split the class into two for every data.