Limitations to using LocalCluster? Crashing persisting 50GB of data to 90GB of memory

Question

System Info: CentOS, python 3.5.2, 64 cores, 96 GB ram

So I'm trying to load a large array (50GB) from a hdf file into ram (96GB). Each chunk is around 1.5GB less than the worker memory limit. It never seems to complete sometimes crashing or restarting workers also I don't see the memory usage on the web dashboard increasing or tasks being executed.

Should this work or am I missing something obvious here?

import dask.array as da
import h5py

from dask.distributed import LocalCluster, Client
from matplotlib import pyplot as plt

lc = LocalCluster(n_workers=64)
c = Client(lc)

f = h5py.File('50GB.h5', 'r')
data = f['data']
# data.shape = 2000000, 1000
x = da.from_array(data, chunks=(2000000, 100))
x = c.persist(x)

Have you tried to load a single chunk and calculate (using `x.nbytes`) the memory is using? — rpanai, Nov 13 '18 at 20:14
I think this is just a misunderstanding on my part I thought each worker would get one chunk of of the Dask array but it seems to try and load the entire array on a single worker which triggers a memory limit, restarting that worker. — dead_zero, Nov 14 '18 at 11:52
@dead_zero that is exactly what is trying to do. In case your data is nicely partitioned for the calculation you want to perform you can try to use the corrispondent of `map_partitions` for `dask.dataframe` or use a distributed loop. — rpanai, Nov 14 '18 at 14:40

score 0 · Answer 1 · answered Nov 14 '18 at 16:45

0

This was a misunderstanding on the way chunks and workers interact. Specifically changing the way the LocalCluster is initialised fixes the issue as described.

lc = LocalCluster(n_workers=1) # This way 1 works has 90GB of mem so can be persisted

answered Nov 14 '18 at 16:45

dead_zero

15
1
5

Limitations to using LocalCluster? Crashing persisting 50GB of data to 90GB of memory

1 Answers1