I have an hdf5 file of shape 80000 * 401408. I need to read data from it in batches of size 64 but the indices can be random say (5, 0, 121, .., 2).
The problem is that while initially the reads are quite consistent and a batch takes say 0.5 seconds to complete, after a while some of the batches take longer upto 10 seconds while some batches are still being read fast. I have observed as more and more reads take place, the reading process is slowing down.
hf = h5py.File( conv_file,'r')
conv_features = hf['conv_features']
while True:
conv_batch = [None for i in range(64)]
for i in range(0, 64):
conv_batch[count] = np.reshape(conv_features[some_random_index], [14, 14, 2048] )
# time for each of the above reads for conv_bacth is different.. varies from 0.5 to 5 seconds.. and slows down over time.
I am not using chunks