0

I've set up a distributed system using dask. When I start the scheduler using the Python API, the dask scheduler doesn't mention starting the dashboard. As expected, I can not reach it on the address I would expect it to be.

Since bokeh is installed, I'd expect the dashboard to be started. When I start the scheduler using the command line however, the dashboard starts correctly. Why is it that starting the scheduler through the python api does not start the dashboard?

Relevant information:

  • python 3.6.7
  • dask 1.0.0
  • dask-glm 0.2.0
  • dask-ml 0.11.0
  • distributed 1.25.1
  • bokeh 1.0.3
  • tornado 5.1.1 (also tried with 4.5)

Output scheduler (via python api):

orval$ python3 myscheduler.py
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:    tcp://10.33.14.65:8786

Code myscheduler.py:

from distributed import Scheduler
from tornado.ioloop import IOLoop
from threading import Thread
s = Scheduler()
s.start('tcp://:8786')   # Listen on TCP port 8786
loop = IOLoop.current()
loop.start()

Starting the scheduler through the command line:

distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:    tcp://10.33.14.65:8786
distributed.scheduler - INFO -       bokeh at:                     :8787
distributed.scheduler - INFO - Local Directory:    /tmp/scheduler-pg2wz3cg
distributed.scheduler - INFO - -----------------------------------------------
mathivh
  • 13
  • 7

1 Answers1

1

Firstly, even when starting the scheduler within a python process, you may wish to consider using LocalCluster:

cluster = dask.distributed.LocalCluster(processes=False, n_workers=0)

where you can reach the scheduler as cluster.scheduler, and cluster.scheduler.services includes "bokeh".

For instantiating directly as you are doing, you would need to specify the services= keyword to include the Bokeh dashboard plugin. The class to instantiate is distributed.bokeh.scheduler.BokehScheduler, something like

services={('bokeh', diagnostics_port): (BokehScheduler, {})}

Were you wanting to do something particular with the loop and thread you have created? Perhaps, in that case, you can be more specific about what you want to achieve.

mdurant
  • 27,272
  • 5
  • 45
  • 74
  • Thanks for the answer, I will try it out once I have more time but it looks useful! I don't have a particular reason why I use the python api. It just happened to be the first approach I took on setting up dask. – mathivh Jan 07 '19 at 16:04