0

My system has 4 CPU, 16 GB RAM. My Aim is to deploy dask distributed workers that use 1 CPU each ONLY to run code assigned to them.

I am deploying a scheduler container and worker containers using docker to run a code that uses Dask delayed and dask dataframes.

Following are the Docker run commands for both:

Scheduler

docker run --name dask-scheduler --network host daskdev/dask:2023.1.1-py3.10 dask-scheduler

worker (multiple docker run commands tried, all different combinations of following command)

docker run --name dask-worker --network host dask-worker --nworkers 1 --nthreads 1 --resources "process=1" --memory-limit 2 GiB --name dask-worker daskdev/dask:2023.1.1-py3.10 10.76.8.50:8786

Creating client and establishig connection with scheduler.
client = Client("tcp://10.76.8.50:8786")

Now what I wanted was that when dask.compute(scheduler="processes") is run, the worker will use only 1 cpu for running the code. However, atleast 3 CPU can be seen at 100% capacity.

cpu usage

Is there something I have missed?
How could I limit a distributed worker to use ONLY 1 CPU for this compute work?

SMI
  • 71
  • 1
  • 11

2 Answers2

1

By specifying --resources "process=1", this amount of resources is allocated per worker. However, to make sure that each task uses this resource, it's necessary to specify the resource used by the task. For example, by annotating:

with dask.annotate(resources=dict(process=1)):
    ...

The above snippet will make sure that all the wrapped computations make use of 1 unit of process resource per task.

SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
1

You are specifying scheduler="processes", which will completely ignore your distributed cluster. Instead, make a distributed client in the python session where you are running your code client = dask.distributed.Client(...) with the TCP address of the scheduler, and the compute will run on your workers, one thread each, automatically.

mdurant
  • 27,272
  • 5
  • 45
  • 74
  • That is how I am doing it, and this was the result. The code was ```Client("tcp://10.76.8.50:8786")``` You're saying that scheduler parameter should not be used in compute function when Client is connected with a distributed scheduler? – SMI Mar 15 '23 at 05:27
  • 1
    right, as I say, `scheduler="processes"` conflicts with and overrides the distributed client which would otherwise be the default engine. Don't use it! – mdurant Mar 15 '23 at 13:50
  • Thank you so much for this very integral information!!! – SMI Mar 16 '23 at 04:08