0

I'm trying to run a function on different combination of all the elements in different arrays with dask, and I'm struggling to apply it.

The serial code is as below:

for i in range(5):
    for j in range(5):
        for k in range(5):
            function(listA[i],listB[j],listC[k])
            print(f'{i}.{j}.{k}')
            k=k+1
        j=j+1
    i=i+1

This code running time on my computer is 18 min, while each array has only 5 elements, i want to run this code parallel with dask on bigger size of arrays. All the calculations inside the function doesn't independent on one another. You can assume that what the function does is: listA[i]*listB[j]*listC[k]

After searching a lot online, i couldn't find any solution. Much appreciate.

Roie15
  • 15
  • 6
  • Are `listA`, `listB`, and `listC` lists of arrays? – darthbith Jan 07 '22 at 16:55
  • No, Each list doesn't have any connection with another. – Roie15 Jan 07 '22 at 17:26
  • I'm sorry, I meant, what are the contents of the lists? Is `listA[i]` an array? – darthbith Jan 07 '22 at 17:32
  • No, for example: listA = [1,2,3,4,5], listA[0] = 1 – Roie15 Jan 07 '22 at 17:37
  • 1
    Then I would recommend that you convert your lists into true NumPy arrays. That will allow you to conduct the calculations using vector and matrix mathematical functions which should speed things up considerably, even without using dask. It will also make it much easier to transition your code to dask if you need to. Unfortunately, your example isn't detailed enough for me to know what specific operations you'd need though. In particular if you could include some sample input and output, that would help a bunch. – darthbith Jan 07 '22 at 18:47
  • Thank you for your help, there are many operations that happens in my function, the input of my code is just as mentioned, the output is the result of each iteration. if each list has 5 elements, so there are 5^3 results. please let me know if any other details missing. – Roie15 Jan 08 '22 at 10:26

1 Answers1

1

The snippet can be improved before using dask. Instead of iterating over index and then looking up the corresponding item in a list, one could iterate over the list directly (i.e. use for item in list_A:). Since in this case we are interested in all combinations of items in three lists, we can make use of the built-in combinations:

from itertools import combinations
triples = combinations(list_A, list_B, list_C)
for i,j,k in triples:
     function(i,j,k)

To use dask one option is to use the delayed API. By wrapping function with dask.delayed, we obtain an immediate lazy reference to the results of the function. After collecting all the lazy references we can compute them in parallel with dask.compute:

import dask
from itertools import combinations

triples = combinations(list_A, list_B, list_C)
delayeds = [dask.delayed(function)(i,j,k)for i,j,k in triples]
results = dask.compute(*delayeds)
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46