11

I have this code :

import os

pid = os.fork()

if pid == 0:
    os.environ['HOME'] = "rep1"
    external_function()
else:
    os.environ['HOME'] = "rep2"
    external_function()

and this code :

from multiprocessing import Process, Pipe

def f(conn):
    os.environ['HOME'] = "rep1"
    external_function()
    conn.send(some_data)
    conn.close()

if __name__ == '__main__':
    os.environ['HOME'] = "rep2"
    external_function()
    parent_conn, child_conn = Pipe()
    p = Process(target=f, args=(child_conn,))
    p.start()
    print parent_conn.recv()
    p.join()

The external_function initializes an external programs by creating the necessary sub-directories in the directory found in the environment variable HOME. This function does this work only once in each process.

With the first example, which uses os.fork(), the directories are created as expected. But with second example, which uses multiprocessing, only the directories in rep2 get created.

Why isn't the second example creating directories in both rep1 and rep2?

Benyamin Jafari
  • 27,880
  • 26
  • 135
  • 150
Hunsu
  • 3,281
  • 7
  • 29
  • 64
  • 2
    I can't reproduce the issue you're seeing. I wrote an `external_function` like this: `def external_function(): print os.environ['HOME']`, and found that the `multiprocessing` example printed out exactly what I expected it to; 'rep1', 'rep2' and the string I sent back from `conn.send` were all printed. – dano Jun 04 '14 at 16:00
  • I think you will find the answer [here](http://stackoverflow.com/questions/2276117/python-multiprocessing-process-vs-standalone-python-vm). –  Jun 04 '14 at 16:10
  • The function I execute is `getInstalledPackages` from https://github.com/vle-forge/pyvle/blob/master/src/pyvle.py. – Hunsu Jun 04 '14 at 16:12
  • @andi That's what I suspected. If you write an answer I will accept it. – Hunsu Jun 04 '14 at 16:14
  • I will do that. Give me a minute. –  Jun 04 '14 at 16:15

2 Answers2

14

The answer you are looking for is in detail addressed here. There is also an explanation of differences between different OS.

One big issue is that the fork system call does not exist on Windows. Therefore, when running a Windows OS you cannot use this method. multiprocessing is a higher-level interface to execute a part of the currently running program. Therefore, it - as forking does - creates a copy of your process current state. That is to say, it takes care of the forking of your program for you.

Therefore, if available you could consider fork() a lower-level interface to forking a program, and the multiprocessing library to be a higher-level interface to forking.

Asclepius
  • 57,944
  • 17
  • 167
  • 143
  • 2
    Please, if anyone else has the same question as the user above, this answer doesn't provide anything but another link to go down. If you could put the relevant information in your answer, as the community guidelines request you do, I will remove my downvote. – Poik Jun 04 '14 at 16:25
  • I hope this is what you are asking for. Otherwise let me know. Thanks for your critics. –  Jun 04 '14 at 16:40
  • 2
    It actually doesn't create a copy of the current process in Windows. Otherwise, all the caveats listed [here](https://docs.python.org/2/library/multiprocessing.html#windows) wouldn't be a problem. See my answer for more details. – Poik Jun 04 '14 at 16:49
  • 1
    'takes care of forking for you' ... what if my program forks within lock. Is deadlock expected? If there's a layer of abstraction between me and the forking, the details are important. – user48956 Apr 16 '18 at 20:03
4

To answer your question directly, there must be some side effect of external_process that makes it so that when the code is run in series, you get different results than if you run them at the same time. This is due to how you set up your code, and the lack of differences between os.fork and multiprocessing.Process in systems that os.fork is supported.


The only real difference between the os.fork and multiprocessing.Process is portability and library overhead, since os.fork is not supported in windows, and the multiprocessing framework is included to make multiprocessing.Process work. This is because os.fork is called by multiprocessing.Process, as this answer backs up.

The important distinction, then, is os.fork copies everything in the current process using Unix's forking, which means at the time of forking both processes are the same with PID differences. In Window's, this is emulated by rerunning all the setup code before the if __name__ == '__main__':, which is roughly the same as creating a subprocess using the subprocess library.

For you, the code snippets you provide are doing fairly different things above, because you call external_function in main before you open the new process in the second code clip, making the two processes run in series but in different processes. Also the pipe is unnecessary, as it emulates no functionality from the first code.

In Unix, the code snippets:

import os

pid = os.fork()

if pid == 0:
    os.environ['HOME'] = "rep1"
    external_function()
else:
    os.environ['HOME'] = "rep2"
    external_function()

and:

import os
from multiprocessing import Process

def f():
    os.environ['HOME'] = "rep1"
    external_function()

if __name__ == '__main__':
    p = Process(target=f)
    p.start()
    os.environ['HOME'] = "rep2"
    external_function()
    p.join()

should do exactly the same thing, but with a little extra overhead from the included multiprocessing library.


Without further information, we can't figure out what the issue is. If you can provide code that demonstrates the issue, that would help us help you.

Community
  • 1
  • 1
Poik
  • 2,022
  • 27
  • 44
  • The example of Dano is not the same as what my function do. – Hunsu Jun 04 '14 at 16:16
  • But his comment is still valid, we cannot reproduce the behavior you are encountering without some input from you. – Poik Jun 04 '14 at 16:27
  • If you want to reproduce the behavior I'm encountering, you must install a bunch of packages. – Hunsu Jun 04 '14 at 16:33
  • If Dano's edit is correct, then we cannot answer your question without at least getting the names of the packages. – Poik Jun 04 '14 at 16:38
  • You can answer my if you know the difference between them. – Hunsu Jun 04 '14 at 16:41
  • There are more issues here than the differences in the libraries. There may be side effects of `external_function` that we cannot account for, which is the only explanation for the behavior you describe that I know of. Additionally, as both my and Andi's answer say, the differences are that the `multiprocessing` library wraps `os.fork`. – Poik Jun 04 '14 at 16:55