I'm reading data from a large text file (a VCF) into a zarr array. The overall flow of the code is
with zarr.LMDBStore(...) as store:
array = zarr.create(..., chunks=(1000,1000), store=store, ...)
for line_num, line in enumerate(text_file):
array[line_num, :] = process_data(line)
I'm wondering - when does zarr compress the modified chunks of the array and push them to the underlying store (in this case LMDB)? Does it do that every time a chunk is updated (i.e. each line)? Or does it wait till a chunk is filled/evicted from memory before doing that? Assuming that I need to process each line separately in a for loop (that there aren't efficient array operations to use here due to the nature of the data and processing), is there any optimization I should do here with regards to how I feed the data into Zarr?
I just don't want Zarr running compression on each modified chunk every line when each chunk will be modified 1000 times before being complete and ready to save to disk.
Thanks!