Note: Already used this Limit total CPU usage in python multiprocessing as a reference.
I have some 200 files of Excel sum of rows is equals to nearly 30k records.
So for an efficient solution used multiprocessing.pool
method.
Sample code here:
def MainFunction(eachFile):
#all the process/operations happening here takes some 25 secs using pandas,numpy...
pool = Pool(maxtasksperchild=1)
s=pool.map(MainFunction, MainListFiles)
pool.close()
pool.join()
del pool
pool=None
The above code Consumes fro 30k rows around 475% of CPU.
The above code Consumes fro 20k rows around 290% of CPU.
The above code Consumes fro 10k rows around 150% of CPU.
So here number of rows I directly proportional to CPU Usage.
Which is deployed in Pivotal cloud foundry, Where other components also running using multiprocessing
. It's highly stressing the cpu of the instance. So the container crashes every time, and the only possible fix is Limitting or controlling the cpu---so it won't increase more than 250%.
Stuff I've tried:
Using
time.sleep()
Split into two lists and put
sleep
for 15 secs between two process. But no luck.List1 = MainListFiles[:len(MainListFiles)//2] List2 = MainListFiles[len(MainListFiles)//2:] pool = Pool(maxtasksperchild=1) s=pool.map(MainFunction, List1) time.sleep(15) s1=pool.map(MainFunction, List2) pool.close() pool.join() del pool pool=None
Used
multiprocessing.Process()
method also frommultiprocessing
module. No luck. Cpu gone to 400%.for files in MainListFiles: s=multiprocessing.Process(target=multiprocessing_func,args=(files,)) processes.append(s) s.start() for process in processes: process.join()
Tried with
pool(processes=os.cpu_count()//2)
No effect---Time elapsed more and cpu still in peak
Please advise me how to can control the cpu on .map()
in multiprocessing
, or any other method will work like queue or pipe?