I'm doing deep learning with caffe and generating my own dataset in HDF5 format. I have 131 976 images all 224x224 which come to about 480MB, and each image has a 1x6 array as a label. I've found that when I generate the .h5 files, they come to 5GB each, 125GB in total. I just want to make sure this is expected. I've checked the contents, but i don't understand how the memory requirement is 250 times bigger. All I'm doing is filling numpy arrays X and Y and creating the datasets (25 in total).
with h5py.File('/media/joe/SAMSUNG/GraspingData/HDF5/train'+str(j)+'.h5','w') as H:
H.create_dataset( 'graspData', data=X) # note the name - give to layer
H.create_dataset( 'graspLabel', data=Y)