Let's say I have a simple costly function that stores some results to a file:
def costly_function(filename):
time.sleep(10)
with open('filename', 'w') as f:
f.write("I am done!)
Now let's say I would like to schedule a number of these tasks in dask, which then takes these requests asynchronously and runs these functions one by one. I'm currently setting up a dask client object...
cluster = dask.distributed.LocalCluster(n_workers=1, processes=False) # my attempt at sequential job processing
client = dask.distributed.Client(cluster)
... and then interactively (from IPython) scheduling these jobs:
>>> client.schedule(costly_function, "result1.txt")
>>> client.schedule(costly_function, "result2.txt")
>>> client.schedule(costly_function, "result3.txt")
The issue that I'm getting is that these tasks are not running consecutively but in parralel, which in my particular case is causing concurrency issues.
So my question is: What is the correct way to set up a job queue like the one I described above in dask?