2

I have a script that runs daily and while invoking processes, out of 100+ processes sometimes randomly 2 or 3 process do not get started and my target function is not called. This started occurring recently and before some days it was running fine. There is no pattern in its skipping processes also, it happens on random days and random no of process get failed. My implementation is as below:

class Worker:
    def __init__(self, count, file):
        self.file = file
        self.count = count
    def invoker(a):
        print("Process [self.count] invoker called")

for i in range(5):
    worker = Worker(i, file).  # Different worker obj for every different process
    process = multiprocessing.Process(target=worker.invoker, args=(a, ))
    proc_list.append(process)
    proc_list[-1].start()
[proc.join() for proc in proc_list]

On all successful days i am able to see below logs: Process 16 invoker called. Process 37 invoker called. Process 9 invoker called. ... Process 100 invoker called., HAVING all values from 1 to 100 in different order. BUT BUT on somedays, its like all values are there except "Process 9 invoker called." and my invoker does not do anything.

On code level it seems to me that everything is fine, but OS does not let the process spawn. Is there anything that I am missing or can check? Please help, thanks!

I tried to add logs in invoker function, and add process level details before & after start of process.

  • There's not enough info in this to be able to tell anything. If the `print` statement was called in this example, it was from a child process, so the issue is not that the process wasn't created. I will say however, `print` statements to `sys.stdout` can get mangled when multiple processes are writing at the same time. It may be better to use `logging` and follow the tutorial for using a queue handler to log from multiple processes. that way you know your log messages won't get mangled, and are going to the correct stream (some IDE's redirect stdout, and don't tell child processes about it). – Aaron Nov 08 '22 at 16:45

2 Answers2

2

Firstly you are not actually checking what's going on. After joining all processes you should check their exitcode to make sure they all executed correctly. It's always a good practice to check the exit status of your processes to make sure everything ran smoothly.

Secondly, you claim you are starting more than a hundred processes and the code you provide as example suggests you are starting them at the same time. What most likely is happening, is your OS cannot start so many processes and few of those fail (most likely due to OOM) before running you function. Hence you don't see it being executed.

noxdafox
  • 14,439
  • 4
  • 33
  • 45
0

I am having the same issue. I am logging my code as much as I can, but cant find anything that explains this phenomenon. I rly think it has something to do with the OS (Linux - Ubuntu in my case) as @noxdafox has mentioned.

I used this code just before I startet (.start()) the process or thread and now I am not running any issues anymore. Probably because the OS has time to start each process individually without intereffering with starting another.

from numpy.random import default_rng
from time import sleep
rng = default_rng()
# generates a scalar [single] value greater than or equal to 1
# but less than 3
time_to_sleep = rng.uniform(0, 3)
sleep(time_to_sleep)

# Start process or thread
proc_list.start()

Code reference: https://stackoverflow.com/a/72267613/7513730

chainstair
  • 681
  • 8
  • 18