0

When running a test case with dask I see 400%+ CPU usage even though I specify 1 worker in multiple ways. On Activity Monitor in OSX I see 2 processes, one with 1 thread, the other with 8 threads with the ThreadPool. I see 2 process, 1 thread and 4 threads with single-threaded. Any idea what all these threads are for?

Related: What threads do Dask Workers have active?

import dask
import dask.array as da
from dask.diagnostics import Profiler, ResourceProfiler, CacheProfiler, visualize
from multiprocessing.pool import ThreadPool

def main():
    a = da.random.random(size=(20000, 1000), chunks=(1000, 1000))
    q, r = da.linalg.qr(a)
    a2 = q.dot(r)
    out = a2.compute()


if __name__ == "__main__":
    with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof:
        #with dask.config.set(pool=ThreadPool(1)):
        #with dask.config.set(num_workers=1):  # 1 worker, 400% usage
        #with dask.config.set(num_workers=1, scheduler='single-threaded'):  # 1 worker, 400% usage
        with dask.config.set(pool=ThreadPool(1)):  # 1 worker, 400% usage
            main()
    visualize([prof, rprof])

Edit: If I comment out the profilers and ThreadPool imports I got 1 process with 4 threads after specifying num_workers=1, scheduler='single-threaded'.

djhoese
  • 3,567
  • 1
  • 27
  • 45

1 Answers1

2

Dask is only running a single task at a time, but those tasks can use many threads internally. In your case this is probably happening because your BLAS/LAPACK implementation is multi-threaded.

You can probably control this with environment variables like OMP_NUM_THREADS=1. There are more specific environment variables depending on your BLAS implementation.

MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • You are absolutely correct. Setting `OMP_NUM_THREADS=1` shows 1 thread for 1 process in Activity Monitor. Thanks. – djhoese Nov 07 '18 at 18:00