-1

I am trying to learn Python multiprocessing.Pool.

import numpy as np
from multiprocessing import Pool


def topla(sayı):
    return sayı+2
    

def summ(number):
    results=[]
    array=np.linspace(0,number,20)
    if __name__ == "__main__":
        p = Pool(4)
        simulation=p.map(topla, array)
        results.append(simulation)
    return results
          
sum_res=summ(8)
print(sum_res)
            

When you run this code, you will see an (1,1) array such as :

[[2.0, 2.4210526315789473, 2.8421052631578947, 3.263157894736842, 3.6842105263157894, 4.105263157894736, 4.526315789473684, 4.947368421052632, 5.368421052631579, 5.789473684210526, 6.2105263157894735, 6.631578947368421, 7.052631578947368, 7.473684210526315, 7.894736842105263, 8.31578947368421, 8.736842105263158, 9.157894736842104, 9.578947368421051, 10.0]]

However when you try to reach sum_res[0] results in an out of index error:

File "C:\ProgramData\Anaconda3\lib\runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\ProgramData\Anaconda3\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\ProgramData\Anaconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\BURGer\Desktop\eren\junk.py", line 26, in <module>
    print(sum_res[0])
IndexError: list index out of range

Similar errors are constantly appearing unless I close the kernel. I am usng spyder, python 3.9.7

Do you know how to solve this problem? Thanks

  • 2
    Please post the actual error, not just a description of it. – Scott Hunter May 12 '23 at 12:06
  • Are you using a Jupyter notbook? If yes: Multiprocessing and notebooks aren't the easiest fit. [Search for related questions](https://stackoverflow.com/search?q=Jupyter+notebook+multiprocessing). – Timus May 12 '23 at 12:40
  • You neglected to mention that without the `print(sum_res[0])` you see two empty lists also being printed. That should have been a clue as to what was causing your problem. – Booboo May 12 '23 at 18:48

1 Answers1

0

First, as an aside, you did not specify your platform (Windows? Linux? Roll-your-own?) as you are directed to when posting multiprocessing questions. But if the print(sum_res) statement is being executed only once (that is, empty arrays are not being printed), then it would imply that the fork method for creating child processes is being used (the default under Linux). Yet, if the spawn method were being used, I would expect the print statement to be executed by the main process and the two child processes and a print(sum_res[0]) statement should generate an IndexError in each child process, which is what you are getting. So I conclude you are running uner Windos or MacOS or some other platform that uses the spawn method but you have not mentioned that without the print(sum_res[0]) statement you first see two empty lists being printed.

When the spawn method is being used each child process must first be initialized by executing a new instance of the Python interpreter which read in the source again and executes every statement at the global level (e.g. function definitions, import statements, variable declarations). However, when this new child process executes, the value for __name__ is no longer __main__, so you can put in a test of __name__ if there are any statements you do not want executed by the child processes. So far so good?

You do not have the following global statements within such a if __name__ == '__main__': block and so they will be executed by the child processes as well as by the main process:

sum_res=summ(8)
print(sum_res[0])

So summ(8) will be called by the two child processes. But within summ the code that would result in appending data to the empty results list is within a if __name__ == '__main__': block and so it never gets executed and what is returned from summ is an empty list, which leads to your IndexError.

Also, the multiprocessing.Pool.map function returns a list and it is this simulation list that you are appending to an empty list. Therefore, you end up with results being a list with a single item, which is also a list. What you probably want is:

import numpy as np
from multiprocessing import Pool


def topla(n):
    return n + 2


def summ(number):
    arr = np.linspace(0, number, 20)
    p = Pool(4)
    results = p.map(topla, arr)
    return results

if __name__ == "__main__":
    sum_res = summ(8)
    print(sum_res)

Prints:

[2.0, 2.4210526315789473, 2.8421052631578947, 3.263157894736842, 3.6842105263157894, 4.105263157894736, 4.526315789473684, 4.947368421052632, 5.368421052631579, 5.789473684210526, 6.2105263157894735, 6.631578947368421, 7.052631578947368, 7.473684210526315, 7.894736842105263, 8.31578947368421, 8.736842105263158, 9.157894736842104, 9.578947368421051, 10.0]
Booboo
  • 38,656
  • 3
  • 37
  • 60