Is there a way to limit memory usage of function call?

Question

I'm trying to execute some code that doesn't fit in the GPU (this also happens with my CPU memory and our data is usually stored as zarr array), and I'm not sure how I could do that with Dask.

I found this example and I'm following a similar strategy but I received several warnings, distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: 12.85 GiB -- Worker memory limit: 7.45 GiB and the data is not being processed on the GPU.

For example:

import cupy as cp
import numpy as np
import dask.array as da
from dask_image import ndfilters as dfilters
from dask.distributed import Client
from functools import partial


if __name__ == '__main__':

    client = Client(memory_limit='8GB', processes=False)
    arr = da.from_array(np.zeros((50, 256, 512, 512), dtype=np.uint16), chunks=(1, 64, 256, 256))
    arr = arr.map_blocks(cp.asarray)

    filtering = partial(dfilters.gaussian_filter, sigma=2)

    scattered_data = client.scatter(arr)
    sent = client.submit(filtering, scattered_data)

    filtered = sent.result().compute()
    client.close()

The GPU has 24GB of memory.

Thanks in advance.

mdurant · Answer 1 · 2021-05-08T11:23:26.647

To answer the specific question: no, there is no way for Dask to know or control how much memory will be used internally to a task. From Dask's point of view, this is arbitrary code and it is simply "called" by python. Monitoring the total process memory in a separate thread is the best tool available.

-previously-

Don't do this:

da.from_array(np.zeros((50, 256, 512, 512), dtype=np.uint16), chunks=(1, 64, 256, 256))

You are materialising a large array, chopping it up and shipping it to workers, where it will need to be deserialised before use. Always make your data in the workers if you can, which in this simplistic case would amount to

da.zeros((50, 256, 512, 512), dtype=np.uint16), chunks=(1, 64, 256, 256)

or in the case of zarr by using da.from_zarr.

Thanks for the tip, but I don't think this solves my problem. — jordao, May 07 '21 at 17:53

Is there a way to limit memory usage of function call?

1 Answers1