Why is Python multithreading in this example so slow?

Question

From what I understand, multithreading is supposed to speed up when the program is IO bound. Why is the example below so much slower, and how can I make it produce the exact same result as the single threaded version?

Single-threaded:

Time: 1.8115384578704834

import time

start_time = time.time()

def testThread(num):
    num = ""
    for i in range(500):
        num += str(i % 10)
        a.write(num)


def main():
    test_list = [x for x in range(3000)]

    for i in test_list:
        testThread(i)

if __name__ == '__main__':
    a = open('single.txt', 'w')
    main()
    print(time.time() - start_time)

Multi-threaded:

Time: 22.509746551513672

import threading
from concurrent.futures import ThreadPoolExecutor
from multiprocessing.pool import ThreadPool
import time

start_time = time.time()


def testThread(num):
    num = ""
    for i in range(500):
        num += str(i % 10)
        with global_lock:
            a.write(num)


def main():
    test_list = [x for x in range(3000)]

    with ThreadPool(4) as executor:
        results = executor.map(testThread, test_list)

    # with ThreadPoolExecutor() as executor:
    #    results = executor.map(testThread, test_list)


if __name__ == '__main__':
    a = open('multi.txt', 'w')
    global_lock = threading.Lock()
    main()
    print(time.time() - start_time)

Also, how is ThreadPoolExecutor different from ThreadPool

*multithreading is supposed to speed up when the program is IO bound*. No, not true. What is true is that the global interpreter lock is not asserted for I/O. Note that, despite multiple threads, your code is not concurrent because all IO is synchronized with a lock. — President James K. Polk, Aug 22 '19 at 01:26
If you are *blocking* on an I/O operation then multithreading will allow you to make progress in other threads while the blocked thread waits for the I/O operation to become unblocked. — President James K. Polk, Aug 22 '19 at 01:28
Network based or tasks where you don't have to write to the same file in parallel. — Pedro Lobito, Aug 22 '19 at 01:29
How could I modify the code above to make the multi-threaded version faster than the single-threaded? — Hyrial, Aug 22 '19 at 01:36
@Hirwo I updated the answer, but the key point is that parallelization will only increase performance for network/multiple file I/O or programs where the computation is the bottleneck. — Mars, Aug 22 '19 at 01:40
Possible duplicate of [Parallel file writing is it efficient?](https://stackoverflow.com/questions/34667551/parallel-file-writing-is-it-efficient) — Mars, Aug 22 '19 at 01:41
I thought multiprocessing is better when computation(cpu) is the bottleneck? — Hyrial, Aug 22 '19 at 01:42
That depends on if your program "waits" when you're doing I/O. If I'm doing something like requesting data from a website/API, I want my program to do other things while it waits for a response. — Mars, Aug 22 '19 at 01:51
For python, multiProcessing allows you to use multiple CPUs. multiThreading does not. If your computations are running your a single core at 100%, then you will likely see an improvement by using multiProcessing — Mars, Aug 22 '19 at 01:53
So the optimal balance of time for multi-threading on one file I/O would be around 50% I/O and 50% cpu? I am understanding "waiting" as time spent on file I/O or requesting data? — Hyrial, Aug 22 '19 at 02:02
Yeah, for a two threaded program, 50% file IO and 50% CPU. That means while thread A is writing, thread B is processing. Then while thread B is writing, thread A is writing. If it's 4 threads, you would want writing to be 25%. — Mars, Aug 22 '19 at 02:06
Yes, waiting here means time spent on file I/O or requesting data. If some threads cannot *work* while other threads are *waiting* you can't actually see an improvement. — Mars, Aug 22 '19 at 02:08

Mars · Accepted Answer · 2019-08-22T02:42:53.040

Edit. Forgot about GIL, silly me.

The below code shows a general approach for other multithreaded languages, as they will have the same problem.

However, in this case, the language is Python. Due to the GIL, there will only ever be one active thread at a time in this case, as none of the threads ever actually yield. (The OS may forcefully swap threads mid-way, but that doesn't change the fact that only one thread would ever run at a time.

For computation and File I/O, Python won't see any gains by multithreading.
In order to increase computation, you can multiprocess. However, you still can't speed up the file I/O

for i in range(500):
        num += str(i % 10)
        with global_lock:
            a.write(num)

You now have 4 threads locking and writing, which is the slower part of the function.
Essentially, You have 4 threads doing 1 task, and you added a ton of overhead and wait time on top if it.

From the comments, here is something that may help:

unordered_output_list = []
def testThread(num):
    num = ""
    for i in range(500):
        num += str(i % 10)
        unordered_output_list.append(num)

def main():
    test_list = [x for x in range(3000)]

    with ThreadPool(4) as executor:
        results = executor.map(testThread, test_list)

    # with ThreadPoolExecutor() as executor:
    #    results = executor.map(testThread, test_list)
      for num in unordered_output_list:
          a.write(num)

if __name__ == '__main__':
    a = open('multi.txt', 'w')
    main()
    print(time.time() - start_time)

Here we do the processing in parallel (a little speed up), then we write in sequence.

From this answer

Short answer: physically writing to the same disk from multiple threads at the same time, will never be faster than writing to that disk from one thread (talking about normal hard disks here). In some cases it can even be a lot slower.

For something like this, where you're writing and the order doesn't matter, I'd collect everything that needs to be printed into a single write — Mars, Aug 22 '19 at 01:30
@Hirwo Pedro's answer is also good, if it's applicable. In that case you'll be using multiple outlets to `write` to the DB, so it can be done in parallel — Mars, Aug 22 '19 at 01:31
Ok, so Python multi-threading can only see gains from requesting data? — Hyrial, Aug 22 '19 at 13:47
@Hirwo Or other types of processing where a thread *yields* (sleeps). For example, if you make a bunch of threads that print the time once per second, if you do something like `while datetime.now() < (starttime + 60): pass//Do nothing` only one thread will run. If you do `while datetime.now() < (starttime + 60) : time.sleep(1)` (pointless code) all of the threads will run — Mars, Aug 23 '19 at 00:26

Why is Python multithreading in this example so slow?

1 Answers1

Edit. Forgot about GIL, silly me.