Is the GIL ever unlocked for non IO-bound work?

Question

It is known that Python's GIL may be temporarily lifted allowing other threads to execute IO-bound code. Many IO-related built-in functions support it.

Why does the following example of CPU-bound code is running in parallel and never blocks?

def fib(n):
    return n if n < 2 else fib(n - 2) + fib(n - 1)


def worker(id, n):
    fib(n)
    print(f'I am worker {id}, computed the fib of n={n}.')


for i in range(5):
    t = threading.Thread(target=worker, args=(i, 32))    # run fib(32)
    t.start()

print('All threads are ready to start!')
print('Main computing fib too!', fib(34))

Nothing blocks, and print('All threads are ready to start!') is executed first. All workers print their results of a really inefficient Fibonacci computation almost at the same time. They finish before main completes its own run of a longer fib call.

There is no IO-bound work in this code, why does it seem the threads are all allowed to run at the same time alongside the main thread?

Because the GIL gets passed back and forth between threads. Otherwise there's no point in threading for anything but IO-bound tasks — roganjosh, Jul 07 '19 at 21:20
The typical example on SO would be someone wanting a countdown on user input. `input()` is blocking, but the countdown timer can still run. And the user can input at any time. The concurrency is an illusion (In terms of the fact that each thread is rapidly handing execution back and forth to eachother but not concurrent) — roganjosh, Jul 07 '19 at 21:28
Yes. The Python interpreter releases the GIL periodically even if no I/O is being done at least partially to give other threads an opportunity to run. I believe I heard about it in a youtube video of a [PyBay Regional Python Conference keynote speech by Raymond Heittinger](https://www.youtube.com/watch?v=9zinZmE3Ogk&feature=youtu.be). — martineau, Jul 07 '19 at 21:58

user3666197 · Answer 1 · 2019-07-08T00:13:03.697

Q: Is the GIL ever unlocked for non IO-bound work?_{...yes, it can ( in several ways )}

This was the O/P question, right, wasn't it?

So, let's get and solve it - python is an interpreted language. The python interpreter, by design, uses the GIL a.k.a. the Global Interpreter Lock ( i.e., it is its python-internal-only "LOCK-device" and has nothing to do with other, O/S-locks, IO-locks et al ).

The GIL-lock is a soft-signalling tool internally used inside the python interpreter, so as to coordinate its own work and to principally avoid any concurrency-originated collisions ( to avoid two attempts to write a value into some variable or to avoid an attempt to read, potentially an "old" value from a variable, that is "currently" being written a "new" value into ), thus sort of artificially introducing a deterministic, purely sequential, principally never-colliding, ordering of such internal python operations.

This means all python threads will obey GIL-based signalling and concurrency is therefore set, for whatever pool of python-GIL-still-coordinated threads to 1. So, except where IO-related waitings introduce "natural" ( external device originated must for a ) waiting state ( where such "naturally" waiting thread will signal by a released GIL-lock the python, to "lend" such a thread's wait-state time rather to some other python-thread to do something useful, the same logic for computing-intensive thread-processing has no sense, as none of the python-threads inside such a computing-pool have any "externally" introduced "natural" wait-states, but need the very opposite - as much scheduled processor-time as possible ... but the damned GIL-plays a round-robin pure-[SERIAL] sequence of CPU working with python-threads one-after-another: tA-tB-tC-tD-tE-...-tA-tB-tC-tD-tE-... thus efficiently avoiding any and all of the potential [CONCURRENT] process-scheduling benefits.

"Why does the following example of CPU-bound code is running in parallel and never blocks?"

Well, still all has being executed as a "pure"-[SERIAL] sequence of small amounts of time, during each of which the CPU is working on one and the only one python thread, internally disrupted after each GIL-lock release duration was spent, so the result seems that all the work is "quasi"-concurrently worked on ( yet still a sequence of execution of the actual work, that was supersampled into small time-execution quanta-of-work and performed one after another till the work was finished ).

So, the python threads actually pay a lot of overhead-costs ( reading, re-reading, at some time POSACK'd acquiring and later forcefully releasing the python in-software GIL-lock ), which costs you a deal of performance-overhead, but you receive nothing in exchange to all those many-threads executed overhead processing. Nothing, but worse performance ( Q.E.D. above in @galaxyan test results )

You would have felt that on your own,if not calling a simple fib(32) but some more demanding computation like to eval something a bit more demanding:

( len( str( [ np.math.factorial( 2**f ) for f in range( 20 ) ][-1] ) ) )

( btw. note that the fib() cannot be a way to go here, as its recursive formulation will soon on something like fib( 10**N ) start crashing right after your N grows over the limit of the python interpreter configuration threshold, set for the python maximum recursion depth limit ...

def aCrashTestDemoWORKER( id, aMaxNUMBER ):

    MASKs = "INF: {2:} tid[{0:2d}]:: fib(2:{1:}) processing start..."
    MASKe = "INF: {2:} tid[{0:2d}]:: fib(2:{1:}) processing ended...{3:}"
    safeM = 10**max( 2, aMaxNUMBER )

    pass;               print( MASKs.format( id, safeM, datetime.datetime.utcnow() ) )
    len( [ fib( someN ) for someN in range(      safeM ) ] )
    pass;               print( MASKe.format( id, safeM, datetime.datetime.utcnow(), 20*"_" ) )

Q: Is the GIL ever unlocked for non IO-bound work?

Yes, it can be - some work can be done, indeed GIL-free.

One, harder to arrange, is to rather use multiprocessing with sub-process based backend - this avoids GIL-locking, yet you pay quite remarkable price by allocating as many full-copies of the whole python-session state ( interpreter + all imported modules + all internal data-structures, whether needed for such distributed computations or not ) plus your ( now INTER-PROCESS ) communications performs serialisation / deserialisations before / after sending even a single bit of information there or back ( that is painful ). For the details on the actual "Economy"-of-costs, one may like to read the Amdahl's law re-formulation, that reflects impacts both from these overheads and atomic-processing durations.

Another case is, when using numba.jit() compiled or pre-compiled code, where smart numba-based LLVM-compiler may get instructed in a decorator with call signature(s) and other details to work in a nogil = true mode, so as to generate a code, that need not use the ( expensive ) GIL-signalling, where appropriate to ask for such comfort.

The last case is to move into a heterogeneous distributed computing design, where python remains a coordinator and remote, distributed computing units are GIL-free number crunchers, where python internal GIL-logic has no meaning and is by design ignored.

BONUS PART:

For more details on computing-intensive performance tricks, you may like ( this post on monitoring threads' overheads )

galaxyan · Answer 2 · 2019-07-08T02:03:11.063

it is slow down due to gil(os context switch). it slower than single thread

on my machine:

def fib(n):
    return n if n < 2 else fib(n - 2) + fib(n - 1)
ts = time.time()
for _ in range(5):
    fib(32)
fib(34)
time.time() - ts

4.9730000495910645

your code

13.2970001698

import multiprocessing as mp
import time


def fib(n):
    return n if n < 2 else fib(n - 2) + fib(n - 1)


def worker(idn, n):
    fib(n)
    print 'I am worker {}, computed the fib of n={}.'.format(idn, n)

ts = time.time()
a = []
for i in range(5):
    t = mp.Process(target=worker, args=(i, 32))    # run fib(32)
    t.start()

[i.join() for i in a]
print('All processes are ready to start!')
print('Main computing fib too!', fib(34))

print time.time() - ts

1.91096901894

a nice counter-example. Would you mind to correct the error in testing? there ought be in your first ( even in simplified version ) code-block `for _ in range(` **`6`** `): fib(` **`32`** `) ...` so as to report and compare at least somehow comparable actual processing durations. — user3666197, Jul 08 '19 at 00:20

Is the GIL ever unlocked for non IO-bound work?

2 Answers2

Q: Is the GIL ever unlocked for non IO-bound work?...yes, it can ( in several ways )

Q: Is the GIL ever unlocked for non IO-bound work?

BONUS PART:

Q: Is the GIL ever unlocked for non IO-bound work?_{...yes, it can ( in several ways )}