I am trying to write a large dask array (46 GB with 124 -- 370 MB chunks) to a zarr file using dask. If my dask array was named dask_data
, then a simple dask_data.to_zarr("my_zarr.zarr")
would work. But from what I understand, this is a synchronous, CPU bound process.
What I would like to do is to use parallelism with much of the work allocated to a Quadro GV100 GPU. I tried to convert the numpy.ndarray to a cupy.ndarray via dask_data_cupy = dask_data.map_blocks(cupy.asarray)
and write this out to a zarr file, but I receive:
ValueError: object __array__ method not producing an array
(and frankly, I do not see a performance boost either).
How could I go about using a GPU to parallelize writing a dask array to a zarr file?
Thanks!