0

Dask.delayed isn't parallelizing- or at least it's not faster than serial. Using their example (found at https://docs.dask.org/en/stable/delayed.html) except replacing "data" with a longer list, the process from start to finish takes over 40 minutes to complete, yet for doing the exact same task serially, it takes only 19 seconds. Furthermore, when I observe my Task Manager's CPU usage, I see no increase in CPU usage; it stays at about 2%. I am used to seeing each processor of the CPU utilized at around 100% when operating in parallel (e.g. when using Ray, another parallel processing Python library), so this leads me to think Dask isn't parallelizing at all.

import dask

@dask.delayed
def inc(x):
    return x + 1

@dask.delayed
def double(x):
    return x * 2

@dask.delayed
def add(x, y):
    return x + y

output = []
for x in [5]*10**7*3: #[5]*10**7*3 is a list of 30-million fives.
    a = inc(x)
    b = double(x)
    c = add(a, b)
    output.append(c)

total = dask.delayed(sum)(output)

total.compute() #takes over 40 minutes

The same code as serial is below, and it took only 19 seconds:

def inc(x):
    return x + 1

def double(x):
    return x * 2

def add(x, y):
    return x + y

data = [1, 2, 3, 4, 5]

output = []
for x in [5]*10**7*3: #[5]*10**7*3 is a list of 30-million fives.
    a = inc(x)
    b = double(x)
    c = add(a, b)
    output.append(c)

total = sum(output)
print(total) #Took 19 seconds

Perhaps I'm doing something wrong here? I can't get my head around what I could be doing wrong.

0 Answers0