The code example below runs as I thought it should on two Linux machines: using Python 3.6.8 on a large CentOS-based server running Red Hat 4.8.5-39 kernel, and using Python 3.7.3 on my MX-based box running Debian 8.3.0-6 kernel).
$ python3 testshared.py filename.dat
filename.dat
270623586670000.0
However, on my Mac running Mojave 10.14.6, using Python 3.8.3, I get an error because foo=[]
in function processBigFatRow()
. Note that foo
is assigned in getBigFatData()
before starting the process pool. It's like in Linux, the version of foo
assigned in getBigFatData()
is passed to the processes while on Mac, the processes just uses the initialization at the top of the code (which I have to put there so they are global
variable).
I understand that process are "independent copies" of the main process and that you can't assign global variables in one process and expect them to change in the other. But what about variables already set before parallel processes are started, and that are only used by reference? It's like process copies are not the same across OSs. Which one is working "as-designed"?
Code example:
import pylab as pl
from concurrent import futures
import sys
foo = []
bar = []
def getBigFatData(filename):
global foo, bar
# get the big fat data
print(filename)
foo = pl.arange(1000000).reshape(1000,1000)
# compute something as a result
bar = pl.sum(foo, axis=1)
def processBigFatRow(row):
total = pl.sum(foo[row,:]**2) if row % 5 else bar[row]
return total
def main():
getBigFatData(sys.argv[1])
grandTotal = 0.
rows = pl.arange(100)
with futures.ProcessPoolExecutor() as pool:
for tot in pool.map(processBigFatRow, rows):
grandTotal+=tot
print(grandTotal)
if __name__ == '__main__':
main()
EDIT:
As suggested, I tested Python 3.8.6 on my MX-Linux box, and it works.
So it works on Linux using Python 3.6.8, 3.7.3 and 3.8.6. But it doesn't on Mac using Python 3.8.3.
EDIT 2:
From multiprocessing doc:
On Unix a child process can make use of a shared resource created in a parent process using a global resource.
So it won't work on Windows (and it's not the best practice), but shouldn't it work on Mac?