0

Note: Already used this Limit total CPU usage in python multiprocessing as a reference.

I have some 200 files of Excel sum of rows is equals to nearly 30k records. So for an efficient solution used multiprocessing.pool method.

Sample code here:

def MainFunction(eachFile):
    #all the process/operations happening here takes some 25 secs using pandas,numpy...

pool = Pool(maxtasksperchild=1)
s=pool.map(MainFunction, MainListFiles)
pool.close()
pool.join()
del pool
pool=None

The above code Consumes fro 30k rows around 475% of CPU.
The above code Consumes fro 20k rows around 290% of CPU.
The above code Consumes fro 10k rows around 150% of CPU.

So here number of rows I directly proportional to CPU Usage.

Which is deployed in Pivotal cloud foundry, Where other components also running using multiprocessing. It's highly stressing the cpu of the instance. So the container crashes every time, and the only possible fix is Limitting or controlling the cpu---so it won't increase more than 250%.

Stuff I've tried:

  1. Using time.sleep()

    Split into two lists and put sleep for 15 secs between two process. But no luck.

    List1 = MainListFiles[:len(MainListFiles)//2]
    List2 = MainListFiles[len(MainListFiles)//2:]
    pool = Pool(maxtasksperchild=1)
    s=pool.map(MainFunction, List1)
    time.sleep(15)
    s1=pool.map(MainFunction, List2)
    pool.close()
    pool.join()
    del pool
    pool=None
    
  2. Used multiprocessing.Process() method also from multiprocessing module. No luck. Cpu gone to 400%.

    for files in  MainListFiles:
        s=multiprocessing.Process(target=multiprocessing_func,args=(files,))
        processes.append(s)
        s.start()
    for process in processes:
        process.join()
    
  3. Tried with pool(processes=os.cpu_count()//2)

    No effect---Time elapsed more and cpu still in peak

Please advise me how to can control the cpu on .map() in multiprocessing, or any other method will work like queue or pipe?

martineau
  • 119,623
  • 25
  • 170
  • 301
user11646543
  • 49
  • 1
  • 9
  • 2
    Use `Pool(processes=n)` where `n` is some number less than `os.cpu_count()` – juanpa.arrivillaga Aug 26 '20 at 17:28
  • Tried Already no luck gave 4 instead 8 processes. – user11646543 Aug 26 '20 at 17:34
  • 1
    Have you tried ``Pool(processes=n)`` with an actual fixed ``n`` instead of e.g. ``os.cpu_count()//2``? Are you certain that ``os.cpu_count()`` returns the number of cores *accessible to you* (i.e. the container), not the entire system outside of the container? – MisterMiyagi Aug 26 '20 at 18:00
  • See also [BPO 36054: On Linux, os.count() should read cgroup cpu.shares and cpu.cfs (CPU count inside docker container)](https://bugs.python.org/issue36054) – MisterMiyagi Aug 26 '20 at 18:07
  • @MisterMiyagi Tried with pool(processes=4) available cores is total 8 cores in a container.But no luck still cpu is % is more but i can see that timetaken for execution is more. – user11646543 Aug 27 '20 at 06:23

0 Answers0