Why do my Dask Futures get stuck in 'pending' and never finish?

Question

I have some long-running code (~5-10 minute processing) that I'm trying to run as a Dask Future. It's a series of several discrete steps that I can either run as one function:

result : Future = client.submit(my_function, arg1, arg2)

Or I can split up into intermediate steps:

# compose the result from the same intermediate results but with Futures
intermediate1 = client.submit(my_function1, arg1)
intermediate2 = client.submit(my_function2, arg1, arg2)
intermediate3 = client.submit(my_function3, intermediate2, arg1)
result = client.submit(my_function4, intermediate3)

If I run this locally (eg, result = my_function(arg1, arg2)), it completes. If I submit it to Dask, I immediately get my Future back - as expected - but the job never completes. Further, if I grab the result.key as a way to track the status of the job, later reconstructing the future as result = Future(key), it always has a state of pending.

I want to first get it running as-is so that I can have my processing offloaded to my Dask workers instead of an API that's handling the requests, and then I want to be able to start splitting up work across nodes so I can improve the performance. But why are my jobs just evaporating? Looking at my Dask scheduler web interface, it doesn't appear the jobs are even showing up. But I know Dask is working because I can submit code to it from my Jupyter notebook.

I'm calling client.submit from a Flask server, and I'm returning the key so it can be used later. Roughly:

@app.route('/submit')
def submit():
    # ...
    future = client.submit(my_function, arg1, arg2)
    return jsonify({"key": future.key})

@app.route('/status/<key>')
def status(key):
    future = Future(key)
    return jsonify({"status": future.status})

When my application is deployed to Kubernetes, my /submit route gets a Future key back, but my Dask status page doesn't show any processing task. If I run Flask locally, I do see a task show up, and the output of my job does show up after an expected delay; however, when I hit my own /status/<key> path with the Future key returned from /submit, it always shows the state is pending.

score 0 · Accepted Answer · answered Jun 13 '20 at 15:30

0

If all futures pointing to a task disappear then Dask feels free to forget about that task. This allows Dask to clean up work, rather than have all intermediate results stay around forever.

If you want to hold on to references then you'll need to hold on to futures. This tells Dask that you still care about the result. You can do this locally in your flask app by creating a dictionary.

futures = {}

@app.route('/submit')
def submit():
    # ...
    future = client.submit(my_function, arg1, arg2)
    futures[future.key] = future
    return jsonify({"key": future.key})

@app.route('/status/<key>')
def status(key):
    future = futures[key]
    return jsonify({"status": future.status})

But you'll also want to think about when you can clean up and release those futures. With this approach you will slowly fill up your memory.

answered Jun 13 '20 at 15:30

MRocklin

55,641
23
163
235

This is, in fact, exactly what I did yesterday but didn't come back here to update my question. – user655321 Jun 13 '20 at 15:48
Is there a better way to do this than storing it in our Flask app? Can't we tell Dask to hold on to the futures in some way? – Sukanya Dasgupta Dec 11 '20 at 11:51
As Flask is generally used along with some WSGI server, requests could in general be handled by different processes/threads, so wouldn't this run the risk that the given future is not in the dictionary? Or more generally, is there a way to do exactly what's done in this sample but in a multi-process environment? I've been trying to use `Client.datasets` to make the cluster hold the futures, but also run into the issue that `future` in `status` always is `'pending'` for a moment. – fuglede Sep 03 '21 at 09:48
@fuglede Did you find a solution to this? As I have exactly the same issue, except I am using dash instead of Flask, I tried to use the "store" component to store the future on the backend, but still get the same 'pending' state. – mp252 Mar 30 '22 at 13:42
@mp252: Nothing really great I'm afraid; some other behavior that caught me by surprise at first was `client.datasets['mydata'].done()` being `False` for completed tasks when the futures (of the completed tasks) are stored elsewhere, and the status sometimes being `'pending'` until I call `.result()`. – fuglede Mar 30 '22 at 14:10
So in particular, the latter issue would make it impossible to get a good signal for completion. I guess in most cases you can get away with keeping the client in a single thread, then communicating with that in whichever way is more convenient. – fuglede Mar 30 '22 at 14:20
Thanks @fuglede, it is really bizarre that Dask forgets about the task, as it celery does not, but celery wasn't able to work with my networkx graph the reason I choose dask. Oh well I will give what you have said a go with datasets. – mp252 Mar 31 '22 at 07:55
I would forgo datasets for this purpose, and instead just do with a design where the Dask client lives in its own thread separate from the web server workers'. – fuglede Mar 31 '22 at 13:20

Why do my Dask Futures get stuck in 'pending' and never finish?

1 Answers1