1

I'm experimenting with Python 3's multiprocessing module, and have the following code that reads a file containing a number on each line and prints the factorization of each number:

import multiprocessing
import sys

NUM_PROCESSES = 4
CHUNK_SIZE = 20

def factor(n):
    #Return a list of factors of n
    factors = []
    #factor out 2's
    while n % 2 == 0:
        factors.append(2)
        n //= 2
    factor = 3
    while n > 1:
        if n % factor == 0:
            factors.append(factor)
            n //= factor
        else:
            factor += 2
    return factors

def process_line(line):
    #process an input file line
    number = int(line)
    factorization = '*'.join(str(x) for x in factor(number))
    return f'{number} = {factorization}\n'

if __name__ == '__main__':
    with open('input.txt') as f:
        with multiprocessing.Pool(NUM_PROCESSES) as pool:
            processed = pool.imap(process_line, f, CHUNK_SIZE)
            sys.stdout.writelines(processed)

I am running this on a Linux system with 2 hyperthreaded physical cores, for a total of 4 virtual cores. I tested this script on a file containing 10,000 random numbers between 2 and 1,000,000, and measured its performance with the time command.

When NUM_PROCESSES is 1, I get the result:

real    0m26.997s
user    0m26.979s
sys     0m0.077s

When NUM_PROCESSES is 2, I get:

real    0m13.477s
user    0m26.809s
sys     0m0.048s

So far this is what I'd expect - adding another process cuts the run time almost exactly in half while the total CPU time remains the same. But when NUM_PROCESSES is 4, I get:

real    0m14.598s
user    0m56.703s
sys     0m0.059s

Not only did the run time not decrease, but it increased by 1 second, even though the CPU time has doubled. This basically means that each virtual CPU was running at half the speed of a physical CPU, so there was no performance benefit from running on all 4 virtual CPUs. Changing CHUNK_SIZE does not seem to significantly affect the performance, event if I set it to 1 or 2,500. Using map() instead of imap() also doesn't change anything.

From my understanding, virtual hyperthreaded cores do not offer the same performance benefits as additional physical cores, but they should still offer some improvement, right? Why then did the script's performance not improve?

turtletuna
  • 11
  • 2
  • You need to leave at least one thread for the system to run. So try with NUM_PROCESSES = 3. – Denis Rasulev Oct 19 '18 at 10:46
  • This article may appear to be interesting, it explains many things: https://medium.com/@bfortuner/python-multithreading-vs-multiprocessing-73072ce5600b – Denis Rasulev Oct 19 '18 at 10:50
  • If I use 3 processes I get real 0m13.809s, user 0m41.019s, so still no improvement. The task I am running _is_ CPU intensive, it performs no IO at all. So this is why I figure that more processes should mean better performance. – turtletuna Oct 19 '18 at 10:51
  • Hyperthreading only adds something like 20-30% performance on a good day. For some workloads the performance benefit of hyperthreading is zero or negative. – Dietrich Epp Oct 19 '18 at 12:30

1 Answers1

0

(sorry i cant add a comment yet) Can you show the output of lscpu ? Creating too many processes/threads will cause context switching. When you run this and use htop, are all cores 100%?

anon
  • 17
  • 2