2

I am running a complicated python method (single run took about 3-5 mins) over multiple cases using multiprocess. I noticed that the running speed is normal when I first kicked off the program: I can get about 20-30 outputs in 5 mins using 30 cores, however, as time goes by, the performance gradually decreases, for example, at some point, I can only get 5 outputs in 30 mins. Then I kill the program and re-run it, then it follows the same behavior. What might be the reason? Is it because of overhead? What can I do? Please see following for a sample code of the parallel program: I am reading pickles (class instance) in the code

import multiprocess as mp
import os
import pickle

def run_all_cases(input_folder):
    pickle_files = [x for x in os.listdir(input_folder)]
    jobs = [(file, input_folder) for file in pickle_files]
    num_process = max(mp.cpu_count()-1, 1)
    with mp.Pool(processes=num_process) as pool:
        pool.starmap(run_single_case, jobs)

def run_single_case(file_name, input_folder):
    print(f"started {file} using {os.getpid()}")
    data = pickle.load(input_folder + file_name)
    # a complicated method in a class 
    data.run_some_method()
    pickle.dump(data, f"{file_name.split("_")[0]}_output.pkl")
    print(f"finished {file} using {os.getpid()}")


Also, when I print out the process id, it is changing overtime. Is this expected (started file_8 using a new process id)? The output is something like (if using 5 cores):

started file_1 using core 8001
started file_2 using core 8002
started file_3 using core 8003
started file_4 using core 8004
started file_5 using core 8005
finished file_1 using core 8001
started file_6 using core 8001
finished file_2 using core 8002
started file_7 using core 8002
started file_8 using core 8006 #<-- it starts a new process id, rather than using the existing ones, is this expected?
finished file_3 using core 8003
...

=================================

UPDATES: I dive deep on some instances where the process is dead after processing them. When I ran the single instance, after some time, in terminal it says:

zsh: killed     python3 test.py

I guess this is the issue: the process is killed due to possible memory issues (any other reasons) and the parent process didn't start a new process to continue running the job, so the number of processes decreases over time, thus I see a performance drop overall.

rpanai
  • 12,515
  • 2
  • 42
  • 64
ilovecp3
  • 2,825
  • 5
  • 18
  • 19
  • what is the memory utilization throughout the job? – haimlit Apr 14 '20 at 00:12
  • Without knowing what `data.run_some_method()` it may be impossible to say. – Iguananaut Apr 14 '20 at 00:15
  • @haimlit, it first utilized around 95% of the memory (64GB) then it starts decreasing – ilovecp3 Apr 14 '20 at 00:23
  • @Iguananaut, it has thousands of lines of code and I don't think it is suitable to paste it here. But basically, it is a lot of pandas dataframe operation on a df with about 50K-100K rows for each input pickle – ilovecp3 Apr 14 '20 at 00:24
  • 2
    Try narrowing down the problem into smaller pieces until you can find a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). Without that it's kind of impossible to say what's going on. – Iguananaut Apr 14 '20 at 00:31

1 Answers1

0

Try killing the old no longer used. It's getting a new process id because multiprocessing starts new processes and those new processes will get their own process ids. While its running look at all running processes on your system and see how many instances of your python-program are running versus how many you expect to be.

user3666197
  • 1
  • 6
  • 50
  • 92
Joe Smith
  • 1
  • 1