6

I am getting error

Dask - WARNING - Worker exceeded 95% memory budget.

I am working on a local PC with 4 physical and 8 virtual cores, I have tried the following:

Per...

Managing worker memory on a dask localcluster

...and the documentation here...

https://distributed.readthedocs.io/en/latest/worker.html#memory-management

...I have tried editing .config\dask\distributed.yaml to uncomment the bottom five lines...

distributed:
  worker:
    # Fractions of worker memory at which we take action to avoid memory blowup
    # Set any of the lower three values to False to turn off the behavior entirely
    memory:
      target: 0.60  # target fraction to stay below
      spill: 0.70  # fraction at which we spill to disk
      pause: 0.80  # fraction at which we pause worker threads
      terminate: 0.95  # fraction at which we terminate the worker

I have also tried the following in my code:

from dask.distributed import Client, LocalCluster

    worker_kwargs = {
        'memory_limit': '1G',
        'memory_target_fraction': 0.6,
        'memory_spill_fraction': 0.7,
        'memory_pause_fraction': 0.8,
    #     'memory_terminate_fraction': 0.95,
    }

    cluster = LocalCluster(ip='0.0.0.0', n_workers=8, **worker_kwargs)
    client = Client(cluster, memory_limit='4GB')

...with and without the memory_limit argument to the Client() function.

Any ideas?

Arco Bast
  • 3,595
  • 2
  • 26
  • 53
P. S.R.
  • 129
  • 1
  • 12

1 Answers1

6

If you do not want dask to terminate the worker, you need to set terminate to False in your distributed.yaml file:

distributed:
  worker:
    # Fractions of worker memory at which we take action to avoid memory blowup
    # Set any of the lower three values to False to turn off the behavior entirely
    memory:
      target: 0.60  # target fraction to stay below
      spill: 0.70  # fraction at which we spill to disk
      pause: 0.80  # fraction at which we pause worker threads
      terminate: False  # fraction at which we terminate the worker

(You might also want to set pause to False.)

The file is typically located at ~/.config/dask/distributed.yaml:

Caveat: Do not forget to uncomment the distributed:, worker: and memory: line. Otherwise the change will have no effect.

Arco Bast
  • 3,595
  • 2
  • 26
  • 53
  • 1
    i tried this to no effect. however, modifying config.yaml in .dask seems to stop the behavior. it then dawned on me that I am running a conda virtual environment so how does this factor in? where would i find the distributed.yaml in the virtual environment? there is one in .config\dask which is the one i modified before but I wonder if it would be under venv somewhere but I can find it? – P. S.R. Sep 19 '19 at 19:07
  • I don't know whether the configuration is influenced by venv. One more problem that I had with that config file: You have also uncomment the `#distributed:` and `#worker` line, correct? – Arco Bast Sep 20 '19 at 09:27
  • 1
    Is there an alternative way to encourage each worker to only bite off enough that they know they can chew? Dask knows the memory profile of the machine it's running on, the size of the data being shifted around, and the tasks it's aiming to perform, and for the most part, makes good automatic choices on sizing and blocksize - could an alternative solution be to override these sensibly? Question is, how to do this, without making it worse? – Thomas Kimber Jan 08 '21 at 17:16
  • 2
    @Thomas Kimber Posted as comment, I guess your question does not get to the attention of people that could answer. You could try asking a question here on StackOverflow or raise an issue on github. – Arco Bast Jan 11 '21 at 16:18
  • What is considered a reasonable value for "target" and "spill"? Is it feasible to set both to False, too? – Anatoly Alekseev Mar 21 '22 at 00:21