For teaching purposes, I'm trying to create simple examples using dask delayed that highlight the GIL when using threads and not processes. I'm using the single-machine scheduler for now to keep things simple. My understanding was that switching from single-threaded to threads would have no change, since the GIL should stop things from executing in parallel.
That's not the case. When I use the threaded
option, the code still runs as fast (actually faster) than with processes
(single-threaded=3s, threads=1s, processes=1.7s). The three delayed calls are being executed basically at the same time.
Obviously I don't understand what's going on as well as I thought. Can someone explain what's going on here? Why is the GIL not locking up my computations with threads?
import time
import dask
from dask import delayed
def func(i):
import time
print(f'Function {i:.0f} starting')
time.sleep(1)
print(f'Function {i:.0f} finished')
lazy = [delayed(func)(i) for i in range(3)]
with dask.config.set(scheduler='processes'): # single-threaded, processes or threads
start = time.time()
dask.compute(lazy)
elaps = time.time() - start
print(elaps)