I am attempting to take folders of images and convert them to an hdf5 file type for use in a classification learning model. Each image in the folder should be paired with the name of the folder as its label for classification. I have 10 folders of images and they need to end up in one large hdf5 file.
I have searched online for answers to this question and really can't seem to find anything that is helping. The only thing I have seen is to create on with just a single folder, but it says nothing about multiple folders or how to append to an hdf5 file. It doesn't matter what language this is programmed in as I am only planning on running the script to create the hdf5 file.
For clarity, the end result should be an hdf5 file with two groups (one the label and the other the data). Each group should have an individual dataset for each image/label and matching datasets should have the same file name/number.
Update: I have found and modified code, found below, that iterates through a folder and creates the dataset for each image. However, I don't know how to add datasets to a specific group within the hdf5 file.
import sys
import glob
import h5py
import cv2
IMG_WIDTH = 30
IMG_HEIGHT = 30
h5file = 'test.h5'
nfiles = len(glob.glob('./*.jpeg'))
print(f'count of image files nfiles={nfiles}')
with h5py.File(h5file,'w') as h5f:
for x in range(nfiles):
convert_num = str(x)
img_ds = h5f.create_dataset(convert_num, shape=(nfiles, IMG_WIDTH, IMG_HEIGHT, 3), dtype=int)