0

How can I instruct dask to use a distributed Client as the scheduler, externally from the code, e.g. via an environment variable?

The motivation is to take advantage of one of the key features of dask - namely the transparency of going from a single machine to a distributed cluster. However, there seems to be one little thing obscuring this transparency - the need to register a Client via code.

I can set the named schedulers (e.g. "synchronous" and "processes") via the config (file/env var) as instructed here, but how do I use the same mechanism with a distributed one?

Ideally, I would like to set something like:

DASK_SCHEDULER=distributed(scheduler_file=...)

as an environment variable which would be equivalent of running client = Client(scheduler_file=...) within python code.

This would then mean the EXACT same code can be run in different environments (local and distributed).

stav
  • 1,497
  • 2
  • 15
  • 40

1 Answers1

0

One way to do it would be do add to pass the scheduler has an argument; per say using Argparse. Thus you could have python my_script.py <ip:port> were you specify either the distributed or <127.0.0.1:port> for local.

mathdugre
  • 98
  • 7