3

what is the appropriate way to scatter broadcast a list using Dask disitributed?

case 1 - wrapping the list:

[future_list] = client.scatter([my_list], broadcast=True)

case 2 - not wrapping the list:

future_list = client.scatter(my_list, broadcast=True)

In the Dask documentation I have seen both examples: 1. wrapping (see bottom example) and 2. not wrapping. In my experience case 1 is the best approach, in case 2 constructing the Dask graph (large in my use case) takes a lot longer.

What could explain the difference in graph construction time? Is this expected behaviour?

Thanks in advance.

Thomas

Thomas Moerman
  • 882
  • 8
  • 16

1 Answers1

6

If you call scatter with a list then Dask will assume that each element of that list should be scattered independently.

a, b, c = client.scatter([1, 2, 3], ...)

If you don't want this, if you actually just want your list to be moved around as a single piece of data, then you should wrap it in another list

[future] = client.scatter([[1, 2, 3]], ...)
MRocklin
  • 55,641
  • 23
  • 163
  • 235