Trying to learn how to use dask library and followed the link https://www.machinelearningplus.com/python/dask-tutorial/
Code with dask delayed decorator
import time
import dask
from dask import delayed
@delayed
def square(x):
return x*x
@delayed
def double(x):
return x*2
@delayed
def add(x, y):
return x + y
@delayed
def sum(output):
sum = 0
for i in output:
sum += i
return sum
t1 = time.time()
# For loop that calls the above functions for each data
output = []
for i in range(99999):
a = square(i)
b = double(i)
c = add(a, b)
output.append(c)
total = dask.delayed(sum)(output)
print(total)
print("Elapsed time: ", time.time() - t1)
Elapsed time : ~8.46s
Normal code without any dask / decorator
import time
def square(x):
return x*x
def double(x):
return x*2
def add(x, y):
return x + y
def sum(output):
sum = 0
for i in output:
sum += i
return sum
t1 = time.time()
# For loop that calls the above functions for each data
output = []
for i in range(99999):
a = square(i)
b = double(i)
c = add(a, b)
output.append(c)
total = sum(output)
print(total)
print("Elapsed time: ", time.time() - t1)
Elapsed time : ~0.043s
Both the code variants are executed on,
Windows machine
4 cores
8 logical cores
Python 3.11.0
dask version 2023.6.0
Shouldn't the code with @delayed decorator from dask perform better when compared to the other variant where functions are executed in serial order? Is it overhead in identifying the tasks to be executed in parallel or serial via task graph making it counterproductive? Was wondering if the iteration count is too minuscule to realize the benefits of dask library, tried increasing the value, and it is still the same.
Can someone please clarify it?