0

I have a large set of vectors that were computed on different data, thus they reside on different workers. Is the following code the most efficient?

grads = [client.submit(compute_grad, x) for x in xs] # list of futures
gradsum_future = client.compute(db.from_sequence(grads).fold(operator.add))
gradsum = client.gather(gradsum_future)
John
  • 935
  • 6
  • 17
  • the above code nondeterministically fails with KeyError – John Jul 29 '18 at 01:32
  • What are the types of the `xs`, and what does `compute_grad` produce? How big are these? – mdurant Jul 29 '18 at 21:23
  • xs are small python objects that are sent around by pickling. compute_grad produces numpy arrays with 2 million float32s – John Jul 30 '18 at 03:45

1 Answers1

0

Below is how I would have implemented it - does that work for you?

grads = client.map(compute_grad, xs)
gradsum_future = client.submit(sum, grads)
gradsum = gradsum _future.result()
Dave Hirschfeld
  • 768
  • 2
  • 6
  • 15