My question may be dumb but I just started learning dask distrubuted. Any help is appreciated.
I have code like below:
@dask.delayed
def do_something(date):
return x, y
get_item0 = dask.delayed(operator.itemgetter(0))
get_item1 = dask.delayed(operator.itemgetter(1))
def handle_x(list_x):
# do something
print(len(list_x))
def handle_y(list_y):
# do something
print(len(list_y))
def do_tasks():
list_x, list_y = [], []
dates = [20210101, 20210102, 20210103, 20210104, 20210105]
for date in dates:
result = do_something(date)
x = get_item0(result)
y = get_item1(result)
list_x.append(x)
list_y.append(y)
return list_x, list_y
with dask.Distributed.Client(cluster) as dask_client:
tasks = do_tasks()
list_x = get_item0(tasks)
list_y = get_item1(tasks)
# I want to print 5, which is number of dates, but this prints 2
print(len(tasks))
# I want to pass list_x and list_y to handle_x and handle_y separately. But the following code computes do_tasks twice. How do I fix that?
dask_client.compute(dask.lazy(handle_x)(list_x)).result()
dask_client.compute(dask.lazy(handle_y)(list_y)).result()
- How can I print out 5 (number of dates)?
print(len(tasks))
seems to print 2 (which is length oflist_x, list_y
) instead of 5 - I want to pass list_x and list_y to handle_x and handle_y separately. But my code computes
do_tasks
twice.
How do I fix them?