Sharing a Dask cluster between projects with different module versions?

Question

I have a situation where multiple different Flask apps are infrequently used for real-time statistic computations.

In this case I need to have good performance when somebody is browsing one of the apps, and at the moment I have a nice and expensive cloud instance to serve them.

I would have liked to use a single Dask cluster to offload the computational heavy-lifting, but different Flask apps have different versions of the same libraries, and I cannot fix that. For example each app comes in paired environments (production and test) and those will always have different modules (by very definition).

As for what I've read in the docs, it is not trivial to have Dask workers load different versions of the same modules based on the connecting client without reloading the modules altogether.

Is it possible to have a shared Dask cluster to offload computations from apps using different versions of the same modules?

-- EDIT --

I've seen a related issue here: https://github.com/cloudpipe/cloudpickle/issues/206

and a PR here: https://github.com/cloudpipe/cloudpickle/pull/391

score 0 · Answer 1 · answered Nov 15 '20 at 22:54

0

No, you cannot have dask workers use different versions of packages for different workloads. Indeed, different versions of modules between the client, scheduler and workers can cause failures by themselves.

You could get creative, by assigning different environments (e.g., docker images) to different resources and claiming resources per workload... but you are completely on your own if you try this.

answered Nov 15 '20 at 22:54

mdurant

27,272
5
45
74

I've tried using cloudpickle for my packages that are not installed on the worker, and it looks like it's working, but I don't know if unpickled objects on the workers conflict with one another in case of equal names. – Federico Bonelli Nov 16 '20 at 13:52
I am surprised that this is possible, would be good to see cloudpickle support it officially. No, I don't think that the names should cause a conflict. The only conflict may come from the hashed key of task, made from the function and arguments to that task. – mdurant Nov 16 '20 at 15:27
Since you were so surprised of that I verified it a bit more throughly and I found that it works BUT only if I cloudpickle functions which are internal functions or lambdas. If I use a function directly in the module at the first level it doesn't work. I am a bit puzzled at the moment. – Federico Bonelli Nov 16 '20 at 19:30
The lambdas can themselves call a first level function and it works without issues – Federico Bonelli Nov 16 '20 at 19:32

Sharing a Dask cluster between projects with different module versions?

1 Answers1