0

For teaching purposes, I'm trying to create simple examples using dask delayed that highlight the GIL when using threads and not processes. I'm using the single-machine scheduler for now to keep things simple. My understanding was that switching from single-threaded to threads would have no change, since the GIL should stop things from executing in parallel.

That's not the case. When I use the threaded option, the code still runs as fast (actually faster) than with processes (single-threaded=3s, threads=1s, processes=1.7s). The three delayed calls are being executed basically at the same time.

Obviously I don't understand what's going on as well as I thought. Can someone explain what's going on here? Why is the GIL not locking up my computations with threads?

import time
import dask
from dask import delayed


def func(i):
    import time
    print(f'Function {i:.0f} starting')
    time.sleep(1)
    print(f'Function {i:.0f} finished')


lazy = [delayed(func)(i) for i in range(3)]
with dask.config.set(scheduler='processes'):  # single-threaded, processes or threads
    start = time.time()
    dask.compute(lazy)
    elaps = time.time() - start
    print(elaps)
jrinker
  • 2,010
  • 2
  • 14
  • 17

1 Answers1

2

The answer is really simple: sleep() does not hold the GIL, as there is nothing to be done. You would need to devise some real "work" in order to lock the thread and degrade parallelism.

mdurant
  • 27,272
  • 5
  • 45
  • 74
  • Ah, bingo! I just got the same answer from a colleague. I changed my function to use a while-loop countdown and the example works now. Thanks. :) – jrinker Aug 27 '19 at 17:33