Not sure if this question makes sense/is relevant wrt zarr. I'm storing zarr data on disk in groups so for example I have
group = zarr.group()
d1 = group.create_dataset('baz', shape=100, chunks=10)
d2 = group.create_dataset('foo', shape=100, chunks=10)
Now group is iterable so I can iterate over it and read the data from all groups:
all_data = [group[g][:] for g in group]
Is there a way to read all of the data from groups using multithreading to speed it up? I know that within an array you can use multithreading to read and write data.
Assuming that reading the data by groups is too slow for me, should I put all of the groups into one data array container? I guess I'm wondering what the function of groups are, aside from an organizational container. Because assuming that each group contains similar data you could theoretically just add another axis to your numpy array (for the groups) and store all groups in one big array.