0

hope all is well! I have hopefully a simple question.. I am trying to convert a double list comprehension line in my code to something that dask can handle (so it can speed up). The code looks like the the following :

np.array([np.array([x[1] for x in Function(subList)]) for subList in List])

This will output a two dimensional array, with the rows equivalent to the amount of subLists in List, and columns equivalent to the length of the Function output. My questions is how can I use Dask to represent this? My concern is that if the number of SubLists or length of Function output (or both) gets very large, this line can take up a lot of time. Therefore if I can somehow parallelize this, it would be ideal.

Any advice would be greatly appreciated.

rmsrms1987
  • 317
  • 1
  • 5
  • 11
  • Is your point that `function` is slow to run? – mdurant Jun 10 '20 at 15:11
  • Not entirely. My concern is that if either the number of SubLists or length of Function gets very large, this line will become very slow. So the idea for using dask would be to kick off a separate process for every iteration in the for loop. – rmsrms1987 Jun 10 '20 at 15:17
  • 1
    If you have very many function calls, dask will only make your situation worse by adding some overhead to each of those calls. You still can use `delayed(function)`, that's where I would look, but please update your question after you've had a go, with example inputs and expected results. – mdurant Jun 10 '20 at 17:28

0 Answers0