Consider a 2D array X
to large to fit in memory--in my case it's stored in the Zarr format but that doesn't matter. I would like to map a function block-wise over the array and save the result without having ever loading the entire array into memory. E.g.,
import dask.array as da
import numpy as np
X = da.arange(10000000,
dtype=np.int32).reshape((10,1000000)).rechunk((10,1000))
def toy_function(chunk):
return np.mean(chunk,axis=0)
lazy_result = X.map_blocks(toy_function)
lazy_result.to_zarr("some_path")
Is there a way to limit the number of blocks evaluated at a single time? In my use case, lazy_result[:,:1000].compute()
fits into memory, but lazy_result.compute()
is too big for memory. When I try to write to Zarr, memory usage increases until it maxes out and is killed. Can I do this without having to result to something inconvenient like for i in range(1000): lazy_result[:,(i*1000):((i+1)*1000)].to_zarr('some_path'+str(i))