Why GIL does not allow underperform threaded code?

Question

Time taken for sequential code(seq.py),

import time

def countDown(n):
    while n > 0:
        n -= 1

n = 50000000
start = time.time()
countDown(n)
end = time.time()
print(end-start)

is,

$ python3.6 seq.py 
 4.209718227386475
$ python3.6 seq.py 
 4.007786750793457
$ python3.6 seq.py 
 4.0265843868255615
$

Time taken for threaded version(usingThreads.py),

from threading import Thread
import time
def countDown(n):
    while n > 0:
        n -= 1


n = 50000000

t1 = Thread(target=countDown, args=(n//2,))
t1.daemon = True
t2 = Thread(target=countDown, args=(n//2,))
t2.daemon = True

start = time.time()
t1.start()
t2.start()
t1.join()
t2.join()
end = time.time()
print(end-start)

is,

$ python3.6 usingThreads.py 
 4.1083903312683105
$ python3.6 usingThreads.py 
 4.093154668807983
$ python3.6 usingThreads.py 
 4.092989921569824
$ python3.6 usingThreads.py 
 4.116031885147095
$

$ nproc
 4
$

Python interpreter should not allow CPU bound threads to release GIL.

Expecting usingThreads.py to take more execution time than seq.py, because,

1) Any one thread is executing at a time, despite 4 cores

2) Time taken for failed attempts to acquire GIL from thread1 by thread2(and vice versa) should add delay in execution.

Edit:

With n=500000000

$ python3.6 seq.py 
 40.22602105140686
$ python3.6 seq.py 
 40.510098457336426
$ python3.6 seq.py 
 40.04688620567322
$
$ python3.6 usingThreads.py 
 40.91394829750061
$ python3.6 usingThreads.py 
 42.30081081390381
$ python3.6 usingThreads.py 
 41.328694581985474

Question:

Why usingThread.py performs better than seq.py?

Looks to me like using threads is usually slower, only faster than one of the seq.py times. — DisappointedByUnaccountableMod, Jul 06 '17 at 16:18
I suspect it's because of your use of `join()` in the threaded version—which probably removes most of the overhead usually associated with using threads. — martineau, Jul 06 '17 at 16:37
@martineau That is the only way I know to wait for threads to complete and then calculate the time taken for complete work. How to avoid `join()`? — overexchange, Jul 06 '17 at 16:48
Use the `Thread.is_alive()` method. i.e. `while t1.is_alive() and t2.is_alive(): pass`. — martineau, Jul 06 '17 at 17:02
Don't think so...because it allows both threads to run at same time. Try it and see if it makes a difference. — martineau, Jul 06 '17 at 17:27
@martineau With your changes, it takes `62.608973026275635` seconds. But [here](https://docs.python.org/2/tutorial/controlflow.html#pass-statements) it says, busy wait. This is why it would take more time — overexchange, Jul 06 '17 at 21:12
Nothing wrong with that. The main thread _needs_ to wait until the two other threads finish so it can calculate and print the elapsed time and there's nothing else for it to do. Using a `while` loop as shown is what gives the two countdown threads a chance to run while the main thread is waiting—and the results prove there is indeed some overhead involved when using threads. — martineau, Jul 06 '17 at 22:12
@martineau Why shouldn't I `join()` for to wait for completion of threads, which is not busy-wait? — overexchange, Jul 06 '17 at 23:32
Because `join`ing prevents Python's normal thread-switching mechanism (sharing the interpreter via the GIL) from happening, which explains the results shown in your question. For that reason the alternative `while` loop suggested is not what is usually meant by [busy-waiting](https://en.wikipedia.org/wiki/Busy_waiting) since it allows that additional processing to occur. — martineau, Jul 07 '17 at 00:25
@martineau So, Do you mean, `is_alive()` allows thread switching but `join()` doesn't allow thread switching? — overexchange, Jul 07 '17 at 00:40
Yes. The `join` method blocks the main thread's execution until the thread `join`ed exits, which is different from the "poll the threads until they're both done" loop the `while` is performing. — martineau, Jul 07 '17 at 00:48
@martineau But worker threads are context switching, which is what I want. Main thread blocking is intended for efficiency, otherwise, `is_alive()` in while loop would take its own cpu cycles — overexchange, Jul 07 '17 at 00:57
Computationally bound threads do not release the GIL since they do no I/O. Try adding a `sleep(.0001)` inside the `coundDown` loop allow it to be released. — martineau, Jul 07 '17 at 01:19
@martineau OK. I see. worked threads are running sequential. Except that, main thread is blocked by `join()` call, waiting for worker's to complete, which looks same as `seq.py`. So, this is why you ask for `is_alive()` in `while` in main thread to make it different. Is that correct? — overexchange, Jul 07 '17 at 01:31
Yes, it sounds like you're grasping the basics of my reasoning—to avoid the blocking effects of `join()` which prevents other threads from running. — martineau, Jul 07 '17 at 04:17
@martineau on context switching using sleep(), GIL acquired thread may run on same/ different core. Observation is, configuring single core for my code perform better than 4 core. — overexchange, Jul 07 '17 at 05:04
No, threads always run on the same cpu/core as the main thread and share the same interpreter amongst themselves, which is why there is a need for the GIL. If you want to run your code on multiple cores, use the `multiprocessing` module where each thread is executed as a separate task, so will have its own copy of the interpreter. However the overhead can be much larger. Read the linked material in @DorElias's answer. Also have a look a the article [**_Python Threads and the Global Interpreter Lock_**](http://jessenoller.com/blog/2009/02/01/python-threads-and-the-global-interpreter-lock). — martineau, Jul 07 '17 at 12:54

DorElias · Answer 1 · 2017-07-06T16:18:17.217

0

both versions of the code did the same amount of work, so it took almost the same amount of time (both counted 50000000 times).

the gil makes it so they wont run in parallel ( so the threads version is not faster) but the overhead from the context switches is relativly small, so you got almost the same result.

there is an explanation here http://www.dabeaz.com/python/UnderstandingGIL.pdf

he is using the same example as you, and in this presentation he got a slower thread version when he used a computer with more than 1 cpus, and he explains it pretty well, when you use more than 1 cpus, you get more overhead (more attempts to context switch) witch make your program slower

edited Jul 06 '17 at 16:18

answered Jul 06 '17 at 16:16

DorElias

2,243
15
18

Only the first seq.py run is slower than the usingthreads, the other two are faster. You need a larger sample for it to be clearer. – DisappointedByUnaccountableMod Jul 06 '17 at 16:19
average time for `seq.py` in your runs is `4.081363121668498` , average time for threader is `4.102641701698303` – DorElias Jul 06 '17 at 16:22
@DorElias Query edited. Yes, threaded version is taking more time, with bigger inputs – overexchange Jul 06 '17 at 16:30

Why GIL does not allow underperform threaded code?

1 Answers1