I'm experimenting with Python 3's multiprocessing module, and have the following code that reads a file containing a number on each line and prints the factorization of each number:
import multiprocessing
import sys
NUM_PROCESSES = 4
CHUNK_SIZE = 20
def factor(n):
#Return a list of factors of n
factors = []
#factor out 2's
while n % 2 == 0:
factors.append(2)
n //= 2
factor = 3
while n > 1:
if n % factor == 0:
factors.append(factor)
n //= factor
else:
factor += 2
return factors
def process_line(line):
#process an input file line
number = int(line)
factorization = '*'.join(str(x) for x in factor(number))
return f'{number} = {factorization}\n'
if __name__ == '__main__':
with open('input.txt') as f:
with multiprocessing.Pool(NUM_PROCESSES) as pool:
processed = pool.imap(process_line, f, CHUNK_SIZE)
sys.stdout.writelines(processed)
I am running this on a Linux system with 2 hyperthreaded physical cores, for a total of 4 virtual cores. I tested this script on a file containing 10,000 random numbers between 2 and 1,000,000, and measured its performance with the time
command.
When NUM_PROCESSES
is 1, I get the result:
real 0m26.997s
user 0m26.979s
sys 0m0.077s
When NUM_PROCESSES
is 2, I get:
real 0m13.477s
user 0m26.809s
sys 0m0.048s
So far this is what I'd expect - adding another process cuts the run time almost exactly in half while the total CPU time remains the same. But when NUM_PROCESSES
is 4, I get:
real 0m14.598s
user 0m56.703s
sys 0m0.059s
Not only did the run time not decrease, but it increased by 1 second, even though the CPU time has doubled. This basically means that each virtual CPU was running at half the speed of a physical CPU, so there was no performance benefit from running on all 4 virtual CPUs. Changing CHUNK_SIZE
does not seem to significantly affect the performance, event if I set it to 1 or 2,500. Using map()
instead of imap()
also doesn't change anything.
From my understanding, virtual hyperthreaded cores do not offer the same performance benefits as additional physical cores, but they should still offer some improvement, right? Why then did the script's performance not improve?