My multiprocessing threadpool takes longer to complete tasks than a single-threaded implementation

Question

I have written an algorithim and am trying to compare performance of diffrent versions. My benchmark function uses a threadpool, but it takes the same time or longer to benchmark than a single-core implementation. I have used pypy and python, versions 3.11 and the result is the same.

Method to benchmark:

def main(print_results=True):
    results = Queue()
    start_time = time.time()

    words = get_set_from_dict_file("usa.txt")
    results.put(f"Total words read: {len(words)}")
    results.put(f"Total time taken to read the file: {round((time.time() - start_time) * 1000)} ms")
    start_time_2 = time.time()

    pairs = getPairs(words)
    results.put(f"Number of words that can be built with 3 letter word + letter + 3 letter word: {len(pairs)}")

    results.put(f"Total time taken to find the pairs: {round((time.time() - start_time_2) * 1000)} ms")

    results.put(f"Time taken: {round((time.time() - start_time) * 1000)}ms")

    if print_results:
        [print(x) for x in results.queue]
    return (time.time() - start_time) * 1000

MultiThreaded Threadpool:

def benchmark(n=1000):
    # start number of threads equal to 90% of cores running main() using multiprocessing, continue until n runs complete
    core_count = os.cpu_count()
    thread_num = floor(core_count * 0.9)
    pool = ThreadPool(thread_num)

    results = pool.map_async(main, [False] * n)
    results = results.get()
    pool.close()
    avg_time_ms = round(sum(results) / len(results))
    # Save best run time and its code as a pickle file in format (time, code)
    # Currently hidden code
    return avg_time_ms, -1

Test:

if __name__ == "__main__":
    print("Do you want to benchmark? (y/n)")
    if input().upper() == "Y":
        print("Benchmark n times: (int)")
        n = input()
        n = int(n) if (n.isdigit() and 0 < int(n) <= 1000) else 100
        start = time.time()
        bench = benchmark(n)
        end = time.time()
        print("\n----------Multi-Thread Benchmark----------")
        print(f"Average time taken: {bench[0]} ms")
        print(f"Best time taken yet: {bench[1]} ms")
        print(f"Total bench time: {end - start:0.5} s")

        start = time.time()
        non_t_results = [main(False) for _ in range(n)]
        end = time.time()
        print("\n----------Single-Thread Benchmark----------")
        print(f"Average time taken: {round(sum(non_t_results) / len(non_t_results))} ms")
        print(f"Total bench time: {end - start:0.5} s")

    else:
        main()

Every time I run it, no matter the number of runs or threads in the pool, the pool never completes faster. Here is an example output:

Do you want to benchmark? (y/n)
y
Benchmark n times: (int)
50

----------Multi-Thread Benchmark----------
Average time taken: 276 ms
Best time taken yet: -1 ms
Total bench time: 2.2814 s

----------Single-Thread Benchmark----------
Average time taken: 36 ms
Total bench time: 1.91 s

Process finished with exit code 0

I expect the threadpool to finish faster.

You are not supposed to use thread for CPU bound tasks. So, completion time should be unrelated. Said otherwise: do not use threads for parallelism. They are not made for that. Only one python instruction is executed at the time, even if you have 10 threads wanting to execute a python instruction. Threads are made for when you have different tasks waiting on a blocking condition. For example, a web server. With 1 thread waiting for new connections, and k threads serving established connections (with the limitating stage being reading request and writing result) — chrslg, Jan 18 '23 at 22:30
Python multithreading can still only do one thing at a time when it involves the python interpreter. It works best when the python code is waiting for external events, such as a web page to come up. There is no reason why multithreading should speed up your code. — Frank Yellin, Jan 18 '23 at 22:32
I missed your usage of the term "single-core", which confirms my suspicion (high one, but not certain so far), that you mean to use thread to distribute computation on several cores. You can't. Threads computation don't run concurrently (for python operations. You can still call a C function that runs computation in a pthread, and await for it to finish). — chrslg, Jan 18 '23 at 22:32
Oh. Thank you. So the processes argument for thread pool is rather the number of threads. For actual separate processes I should be using multiprocessing.Pool not ThreadPool ? — RoboCreeper707, Jan 18 '23 at 23:00

score 0 · Answer 1 · answered Jan 19 '23 at 01:04

It turns out I was using threads instead of processes. Thanks to the commentators I was able to understand that ThreadPool is for concurrent processing, and Pool is for parallel processing.

Here was the changed benchmark:

def benchmark(n=1000):
    # start number of threads equal to 90% of cores running main() using multiprocessing, continue until n runs complete
    core_count = os.cpu_count()
    process_num = floor(core_count * 0.9)

    with Pool(process_num) as pool:
        results = pool.map_async(main, [False] * n)
        results = results.get()
    avg_time_ms = round(sum(results) / len(results))
    # Save best run time and its code as a pickle file in format (time, code)
    """..."""
    return avg_time_ms, -1

My multiprocessing threadpool takes longer to complete tasks than a single-threaded implementation

1 Answers1