dask distributed: adding up a collection of vectors residing on different workers

Question

I have a large set of vectors that were computed on different data, thus they reside on different workers. Is the following code the most efficient?

grads = [client.submit(compute_grad, x) for x in xs] # list of futures
gradsum_future = client.compute(db.from_sequence(grads).fold(operator.add))
gradsum = client.gather(gradsum_future)

What are the types of the `xs`, and what does `compute_grad` produce? How big are these? — mdurant, Jul 29 '18 at 21:23
xs are small python objects that are sent around by pickling. compute_grad produces numpy arrays with 2 million float32s — John, Jul 30 '18 at 03:45

score 0 · Answer 1 · answered Jul 30 '18 at 03:48

0

Below is how I would have implemented it - does that work for you?

grads = client.map(compute_grad, xs)
gradsum_future = client.submit(sum, grads)
gradsum = gradsum _future.result()

answered Jul 30 '18 at 03:48

Dave Hirschfeld

768
2
6
15

dask distributed: adding up a collection of vectors residing on different workers

1 Answers1