Why is threaded dask example executing in parallel

Question

For teaching purposes, I'm trying to create simple examples using dask delayed that highlight the GIL when using threads and not processes. I'm using the single-machine scheduler for now to keep things simple. My understanding was that switching from single-threaded to threads would have no change, since the GIL should stop things from executing in parallel.

That's not the case. When I use the threaded option, the code still runs as fast (actually faster) than with processes (single-threaded=3s, threads=1s, processes=1.7s). The three delayed calls are being executed basically at the same time.

Obviously I don't understand what's going on as well as I thought. Can someone explain what's going on here? Why is the GIL not locking up my computations with threads?

import time
import dask
from dask import delayed


def func(i):
    import time
    print(f'Function {i:.0f} starting')
    time.sleep(1)
    print(f'Function {i:.0f} finished')


lazy = [delayed(func)(i) for i in range(3)]
with dask.config.set(scheduler='processes'):  # single-threaded, processes or threads
    start = time.time()
    dask.compute(lazy)
    elaps = time.time() - start
    print(elaps)

score 2 · Accepted Answer · answered Aug 27 '19 at 17:31

2

The answer is really simple: sleep() does not hold the GIL, as there is nothing to be done. You would need to devise some real "work" in order to lock the thread and degrade parallelism.

answered Aug 27 '19 at 17:31

mdurant

27,272
5
45
74

Ah, bingo! I just got the same answer from a colleague. I changed my function to use a while-loop countdown and the example works now. Thanks. :) – jrinker Aug 27 '19 at 17:33

Why is threaded dask example executing in parallel

1 Answers1