3

Is it possible to run dask from a python script?

In interactive session I can just write

from dask.distributed import Client
client = Client()

as described in all tutorials. If I write these lines however in a script.py file and execute it python script.py, it immediately crashes.

I found another option I found, is to use MPI:

# script.py
from dask_mpi import initialize
initialize()

from dask.distributed import Client
client = Client()  # Connect this local process to remote workers

And then run the script with mpirun -n 4 python script.py. This doesn't crash, however if you print the client

print(client)
# <Client: scheduler='tcp://137.250.37.84:35145' processes=0 cores=0> 

you see that no cores are used, accordingly scripts run forever without doing anything.

How do I set my scripts up correctly?

DerWeh
  • 1,721
  • 1
  • 15
  • 26

1 Answers1

7

If you want to create processes from within a Python script you need to protect that code in an if __name__ == "__main__": block

from dask.distributed import Client

if __name__ == "__main__":
    client = Client()

If you want to use dask-mpi then you need to run it with mpirun or mpiexec with a suitable number of processes.

MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • Could you also provide the correct example for usage of `dask_mpi`? Then I will mark the answer as accepted. Even if the ifmain guard, `mpirun -n 4 python script.py` still uses neither cores nor processes. – DerWeh Aug 05 '20 at 07:09
  • Hrm, my first guess would be that your MPI system isn't set up well, but presumably you've used it before. I don't know. I would ask the experts at github.com/dask/dask-mpi – MRocklin Aug 08 '20 at 01:29
  • I am obviously no expert, but if I run C++ code with `mpirun -n CORES` the correctly run in parallel. So unless `dask` relays on something special, I would say everything is OK. Thanks for the help, I will ask. – DerWeh Aug 08 '20 at 13:59