I have many processes to do, each can take up to 20 minutes and uses 100% CPU. I am new to multiprocessing and I decided to use joblib since it seems to let me multiprocess without threading (I have 12 cores and would like to do 12 processes at a time, starting new ones as the old ones finish, and I could not get this to work with Pool or mp.Process).
I am running python2.7 and have recreated a simple version of what is happening.
from joblib import Parallel, delayed
import numpy as np
from time import sleep
def do_something():
print np.random.choice([0, 1])
sleep(3)
if __name__ == '__main__':
Parallel(n_jobs=3, backend='multiprocessing')(delayed(do_something)() for n in xrange(30))
Output is always in sets of threes, either '1 1 1' or '0 0 0', so the number is only generated for the first process. I thought that joblib.Parallel would just call the function 30 separate times and use 3 cores to do so.
Is there a way to make it so that a new number is generated each time do_something() is called?
** edit: Apparently this just how random generators work; they use the timestamp on your computer. When you call in parallel, the call time is the same for all workers so they will all generate the same number. Since I know how many times the function will be called in my real code, I solved this by generating a list of random numbers beforehand and pulling from that list in each call.