1

I am trying to refine a very large JSON Data Set. In order to do that, I split the file into many subparts (with the Unix split command), and assign each part to a process so that it can be fetched and refined independetly. Each process has its input file, which corresponds to a subset of the main dataset. Here is how my code looks like:

import multiprocessing as mp
def my_target(input_file, output_file):
  ...
  some code 
  ...
# Is it possible to end the process here ?
#end of the function
worker_count = mp.cpu_count()
processes = [mp.Process(target = my_target, args=(input_file, output_file)) for _ in range(worker_count)]

for p in processes:
   p.start()

It is very likely that the processes won't terminate at the same time and hence here is my question: Is it possible to terminate a process when it reaches the last line of the target_function my_target() ?

I suppose that letting processes idle after they're finished with their tasks can slow the evolution of other processes no ?

Any recommendations ?

Odess4
  • 420
  • 5
  • 17
  • 1
    No, having idle processes won't probably hurt the performance of the others. However, you might want to look into the higher-level `mp.Pool`, so you can just `imap_unordered()` over your work items... – AKX Nov 22 '21 at 19:32
  • 1
    you want to kill the other processes when a single on finishes? otherwise child processes already exit (`os._exit()`) when `target` returns automatically – Aaron Nov 23 '21 at 06:02

1 Answers1

0

I guess, that you should check this question, as related to what you might need: how to to terminate process using python's multiprocessing. Because you have to take care about the "zombie process", because if the process is ended and not joined - it will become idle.

  • Oh! does joining the processes solve the problem ? I have alkready done that in my code, but still the program takes a LOT of time to terminate, even with many processes, so I was wondering about whether join() method worked or not. – Odess4 Nov 22 '21 at 19:39
  • 2
    Python already has many safeguards in place to prevent zombie processes. It is actually quite difficult to accidentally create them anymore. – Aaron Nov 23 '21 at 06:03