3

I am working on python code using multiprocessing. Below is the code

import multiprocessing
import os

def square(n):
    #logger.info("Worker process id for {0}: {1}".format(n, os.getpid()))
    logger.info("Evaluating square of the number {0}".format(n))
    print('process id of {0}: {1}'.format(n,os.getpid()))
    return (n * n)

if __name__ == "__main__":
    # input list
    mylist = [1, 2, 3, 4, 5,6,7,8,9,10]

    # creating a pool object
    p = multiprocessing.Pool(4)

    # map list to target function
    result = p.map(square, mylist)

    print(result)

The number of CPU cores in my server is 4. If I use 4 only single processes is initiated. In general, it should start 4 separate processes right?.

If I set the value to 8 in the Pool object below is the response I got

process id of 1: 25872

process id of 2: 8132

process id of 3: 1672

process id of 4: 27000

process id of 6: 25872

process id of 5: 20964

process id of 9: 25872

process id of 8: 1672

process id of 7: 8132

process id of 10: 27000

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

This started 5 separate processes(25872,8132,1672,27000,20964) even though there are only 4 cpu cores.

  1. I don't understand why the pool initiated only 1 process when the value is 4 and initiated 5 separate processes when the value is 8.

  2. Can pool object be instantiated with a value greater than the number of CPU cores?

  3. Also what should be the optimal value we should use while instantiating pool object if a list contains a million records?

I have been through official python documentation, but I couldn't find info. Please help

Netwave
  • 40,134
  • 6
  • 50
  • 93
sudhir
  • 219
  • 5
  • 17

1 Answers1

2

Let's answer one by one.

  1. I don't understand why the pool initiated only 1 process when the value is 4 and initiated 5 separate processes when the value is 8.

The pool initiated 4 processes. Do not mistake the number of cores you have for the number of processes, is totally independent. You have 5 processes because the initial python one also counts. So, you started with the main python processes, which call the pool to start another 4 ones, that makes 5 of them. In the case that you see that only a few of the processes are being used, it means that probably they are capable of killing the task fast enough so the other processes are not needed.

  1. Can pool object be instantiated with a value greater than the number of CPU cores?

Yes indeed, you can instantiate any number you want (although there may be some kind of limit depending on the OS). But notice that this will just make your CPU to be overloaded. More explanation below.

  1. Also what should be the optimal value we should use while instantiating pool object if a list contains a million records?

Well, usually the "optimal" would be that all the cores of your CPU are fully in usage by your pool. So, if you have 4 cores, 4 processes would be the best option, although sometimes this is not exactly like that it is a good starting approximation.

One last note,

I have been through official python documentation, but I couldn't find info.

This is not really python specific, it is general behavior in CS.

Netwave
  • 40,134
  • 6
  • 50
  • 93
  • Thanks for the response. So if value 4, will all cpu cores be utilized? – sudhir May 30 '19 at 08:48
  • @sudhir, that depends on the OS, but it should. – Netwave May 30 '19 at 08:52
  • You mentioned "The pool initiated 4 processes" when pool object is initiated with value 4 but why I see only 1 process id? it should have started separate process with different process id on each core right – sudhir May 30 '19 at 11:07
  • @sudhir, yes, it should start 4 new processes – Netwave May 30 '19 at 11:09
  • process id of 1: 26648 process id of 2: 26648 process id of 3: 26648 process id of 4: 26648 process id of 5: 26648 process id of 6: 26648 process id of 7: 26648 process id of 8: 26648 process id of 9: 26648 process id of 10: 26648 – sudhir May 30 '19 at 11:10
  • Above is the output I got when I initiated with value 4. I see only 1 process id – sudhir May 30 '19 at 11:11
  • process id of 1: 26372 process id of 2: 21576 process id of 3: 26112 process id of 4: 18364 process id of 5: 25952 process id of 6: 19344 process id of 7: 6600 process id of 8: 27300 process id of 9: 12652 process id of 10: 25488 Output when value is 20 – sudhir May 30 '19 at 11:14
  • @sudhir, try with 4 and forcing `chunksize=1` in the call to map, `result = p.map(square, mylist, chunksize=1)` – Netwave May 30 '19 at 11:16
  • I tried result = p.map(square, mylist, chunksize=1) still printing single process id process id of 1: 19016 process id of 2: 19016 process id of 3: 19016 process id of 4: 19016 process id of 5: 19016 process id of 6: 19016 process id of 7: 19016 process id of 8: 19016 process id of 9: 19016 process id of 10: 19016 – sudhir May 30 '19 at 11:23
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/194173/discussion-between-netwave-and-sudhir). – Netwave May 30 '19 at 11:23