Actually, you will have 8
instances of SomeKindOfClass
(one for each worker), but only 4 will ever be active at the same time.
multiprocessing
vs multiprocessing.dummy
Your program will only work if you continue to use the multiprocessing.dummy
module, which is just a wrapper around the threading
module. You are still using "python threads" (not separate processes). "Python threads" share the same global state; "Processes" don't. Python threads also share the same GIL, so they're still limited to running one python bytecode statement at a time, unlike processes, which can all run python code simultaneously.
If you were to change your import to from multiprocessing import Pool
, you would notice that the allOutputs
array remains unchanged after all the workers finish executing (also, you would likely get an error because you're creating the pool in the global scope, you should probably put that inside a main()
function). This is because multiprocessing
makes a new copy of the entire global state when it makes a new process. When the worker modifies the global allOutputs
, it will be modifying a copy of that initial global state. When the process ends, nothing will be returned to the main process and the global state of the main process will remain unchanged.
Sharing State Between Processes
Unlike threads, processes aren't sharing the same memory
If you want to share state between processes, you have to explicitly declare shared variables and pass them to each process, or use pipes or some other method to allow the worker processes to communicate with each other or with the main process.
There are several ways to do this, but perhaps the simplest is using the Manager
class
import multiprocessing
def worker(args):
index, array = args
a = SomeKindOfClass()
a.some_expensive_function()
array[index] = a.output
def main():
n = 8
manager = multiprocessing.Manager()
array = manager.list([0] * n)
pool = multiprocessing.Pool(4)
pool.map(worker, [(i, array) for i in range(n)])
print array