Delayed Decorator in Dask Library - results are counter productive

Question

Trying to learn how to use dask library and followed the link https://www.machinelearningplus.com/python/dask-tutorial/

Code with dask delayed decorator

import time
import dask
from dask import delayed

@delayed
def square(x):
    return x*x

@delayed
def double(x):
    return x*2

@delayed
def add(x, y):
    return x + y

@delayed
def sum(output):
    sum = 0
    for i in output:
        sum += i

    return sum

t1 = time.time()

# For loop that calls the above functions for each data
output = []
for i in range(99999):
    a = square(i)
    b = double(i)
    c = add(a, b)
    output.append(c)

total = dask.delayed(sum)(output)
print(total)

print("Elapsed time: ", time.time() - t1)

Elapsed time : ~8.46s

Normal code without any dask / decorator

import time

def square(x):
    return x*x

def double(x):
    return x*2

def add(x, y):
    return x + y

def sum(output):
    sum = 0
    for i in output:
        sum += i

    return sum

t1 = time.time()

# For loop that calls the above functions for each data
output = []
for i in range(99999):
    a = square(i)
    b = double(i)
    c = add(a, b)
    output.append(c)

total = sum(output)
print(total)

print("Elapsed time: ", time.time() - t1)

Elapsed time : ~0.043s

Both the code variants are executed on,

Windows machine

4 cores

8 logical cores

Python 3.11.0

dask version 2023.6.0

Shouldn't the code with @delayed decorator from dask perform better when compared to the other variant where functions are executed in serial order? Is it overhead in identifying the tasks to be executed in parallel or serial via task graph making it counterproductive? Was wondering if the iteration count is too minuscule to realize the benefits of dask library, tried increasing the value, and it is still the same.

Can someone please clarify it?

My basic understanding of Dask allows me to claim that the Dask example does not compute anything but merely creates a graph of `Delayed` objects. You should call `compute()` to compute the total. That will make it even slower for sure and can be an example of code that is better to run without Dask. — Jacek Laskowski, Jun 12 '23 at 10:19

score -1 · Answer 1 · answered Jun 12 '23 at 13:28

Is it overhead in identifying the tasks to be executed in parallel or serial via task graph making it counterproductive

Yes, there is a cost to both defining the graph of execution and then executing each task. At the very minimum, this involves switching threads and checking for each task's done-ness. In practice, there are other costs associated with deciding which task to do next and stitching together results. Because your python functions will run in <100ns, dask's overhead is significant.

the Dask example does not compute anything but merely creates a graph of Delayed objects. You should call compute() to compute the total.

This is right. For this example, the cost of creating and storing a task for later execution is more than running the task.

In [4]: %%timeit
   ...: @delayed
   ...: def add(x, y):
   ...:     return x + y
   ...:
6.58 µs ± 26.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [5]: def add(x, y):
   ...:     return x + y
   ...:

In [6]: %timeit add(1, 1)
59.6 ns ± 0.0345 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

The overhead to run a task depends on the scheduler, but is or order 100µs for the threaded scheduler, closer to 1ms for distributed.

So when is dask useful?

if you have functions that take longer to run
if you can batch many small calculations into tasks where the overhead becomes small compared to the run time

Also note, that your code as written would not parallelise well in a single process due to the GIL. In practice, the real way to speed up this stuff is to use numpy well before you consider parallel options such as dask.

Delayed Decorator in Dask Library - results are counter productive

1 Answers1