If I import numpy in a single process, it takes approximately 0.0749 seconds:
python -c "import time; s=time.time(); import numpy; print(time.time() - s)"
Now if I run the same code in multiple Processes, they all import significantly slower:
import subprocess
cmd = 'python -c "import time; s=time.time(); import numpy; print(time.time() - s)"'
for n in range(5):
m = 2**n
print(f"Importing numpy on {m} Process(es):")
processes = []
for i in range(m):
processes.append(subprocess.Popen(cmd, shell=True))
for p in processes:
p.wait()
print()
gives the output:
Importing numpy on 1 Process(es):
0.07726049423217773
Importing numpy on 2 Process(es):
0.110260009765625
0.11645245552062988
Importing numpy on 4 Process(es):
0.13133740425109863
0.1264667510986328
0.13683867454528809
0.153900146484375
Importing numpy on 8 Process(es):
0.13650751113891602
0.15682148933410645
0.17088770866394043
0.1705784797668457
0.1690073013305664
0.18076491355895996
0.18901371955871582
0.18936467170715332
Importing numpy on 16 Process(es):
0.24082279205322266
0.24885773658752441
0.25356197357177734
0.27071142196655273
0.29327893257141113
0.2999141216278076
0.297823429107666
0.31664466857910156
0.20108580589294434
0.33217334747314453
0.24672770500183105
0.34597229957580566
0.24964046478271484
0.3546409606933594
0.26511287689208984
0.2684178352355957
The import time per Process seems to grow almost linearly with the number of Processes (especially as the number of Processes grows large), it seems we spend a total of about O(n^2)
time on importing. I know there is an import lock, but not sure why it is there. Are there any work arounds? And if I work on a server with many users running many tasks, could I be slowed down by someone spawning tons of workers that just import common packages?
The pattern is clearer for larger n
, here's a script that shows that more clearly by just reporting the average import time for n
workers:
import multiprocessing
import time
def f(x):
s = time.time()
import numpy as np
return time.time() - s
ps = []
for n in range(10):
m = 2**n
with multiprocessing.Pool(m) as p:
print(f"importing with {m} worker(s): {sum(p.map(f, range(m)))/m}")
output:
importing with 1 worker(s): 0.06654548645019531
importing with 2 worker(s): 0.11186492443084717
importing with 4 worker(s): 0.11750376224517822
importing with 8 worker(s): 0.14901494979858398
importing with 16 worker(s): 0.20824094116687775
importing with 32 worker(s): 0.32718323171138763
importing with 64 worker(s): 0.5660803504288197
importing with 128 worker(s): 1.034045523032546
importing with 256 worker(s): 1.8989756992086768
importing with 512 worker(s): 3.558808562345803
extra details about environment in which I ran this:
- python version: 3.8.6
- pip list:
Package Version
---------- -------
numpy 1.20.1
pip 21.0.1
setuptools 53.0.0
wheel 0.36.2
os:
- NAME="Pop!_OS"
- VERSION="20.10"
Is it just reading from filesystem that is the problem?
I've added this simple test where instead of importing, I now just read the numpy files and do some sanity check calculations:
import subprocess
cmd = 'python read_numpy.py'
for n in range(5):
m = 2**n
print(f"Running on {m} Process(es):")
processes = []
for i in range(m):
processes.append(subprocess.Popen(cmd, shell=True))
for p in processes:
p.wait()
print()
with read_numpy.py
:
import os
import time
file_path = "/home/.virtualenvs/multiprocessing-import/lib/python3.8/site-packages/numpy"
t1 = time.time()
parity = 0
for root, dirs, filenames in os.walk(file_path):
for name in filenames:
contents = open(os.path.join(root, name), "rb").read()
parity = (parity + sum([x%2 for x in contents]))%2
print(parity, time.time() - t1)
Running this gives me the following output:
Running on 1 Process(es):
1 0.8050086498260498
Running on 2 Process(es):
1 0.8164374828338623
1 0.8973987102508545
Running on 4 Process(es):
1 0.8233649730682373
1 0.81931471824646
1 0.8731539249420166
1 0.8883578777313232
Running on 8 Process(es):
1 0.9382946491241455
1 0.9511561393737793
1 0.9752676486968994
1 1.0584545135498047
1 1.1573944091796875
1 1.163221836090088
1 1.1602907180786133
1 1.219961166381836
Running on 16 Process(es):
1 1.337137222290039
1 1.3456192016601562
1 1.3102262020111084
1 1.527071475982666
1 1.5436983108520508
1 1.651414394378662
1 1.656200647354126
1 1.6047494411468506
1 1.6851506233215332
1 1.6949374675750732
1 1.744239330291748
1 1.798882246017456
1 1.8150532245635986
1 1.8266475200653076
1 1.769331455230713
1 1.8609044551849365
There is some slowdown, 0.805 seconds for 1 worker, and between 0.819 and 0.888 seconds for 4 workers. Compared to import
: 0.07 seconds for 1 worker, and between 0.126 and 0.153 seconds for 4 workers. Seems like there might be something other than filesystem reads slowing down import