2

I'm new to Python programming (using v.3.8.8) and have a very basic question on accessing global variables in a multithreading program using processes. For example, I have the following simple code:

from multiprocessing import Process

global_var = 777

def workerThread():
    global global_var
    
    my_file = open("out_file",'w')
    print(global_var, file=my_file)
    my_file.close()
    
    return


if __name__ == '__main__':
    global_var = 999

    if (True):
        # use multiprocessing, but launch only one thread
        procs = Process(target=workerThread, args=())
        procs.start()
        procs.join()
    else:
        # standard function call, so these two paths should be equivalent
        workerThread()

I can execute it with two code-paths. "True" runs it in multithreaded mode (but with only one thread) and "False" calls the function directly, so the two are esentially equivalent.

Therefore, I had assumed that both would give the same behavior and output "999", because in both cases workerThread() is called after the global variable has been set to 999 (I pipe the output to "out_file" because stdout does not print when multithreading).

But for some reason, the multithreaded approach outputs the original value of 777 while the direct function call outputs 999. This doesn't make any sense to me. Why are they different? How do I fix this?

I initially thought of adding a global global_var right after if __name__ == '__main__': in order to ensure that it's setting the global variable (and not a local variable by the same name), but that didn't make sense because the "if" statement is not a separate function, but rather part of the "main" function. Indeed, when I tried this, it gave me a syntax error "SyntaxError: name 'global_var' is assigned to before global declaration". So that's clearly not the right answer.

So I am not sure what is going on, and how to do this correctly. It seems like a very simple thing that should be easy to do, but I'm completely stuck and have looked around but found no answer. Any suggestions?

Finally, at the end of the day, I don't want to use a global variable declared in this file, but rather a global variable declared in another file, say "my_globals.py", so I can access the same global variable from multiple files as the program runs. So I had first tried accessing it within the workerThread as my_globals.global_var but that also didn't work, which led me to simplify until I got to this code.

And in that case, the statement global my_globals.global_var is invalid (syntax error), so I am not sure how to guarantee that the workerThread function would use the global variable from the other file. Or is it automatically considered a global variable because it's an attribute of my_globals? Trying to figure out if there is something like extern in C...

Thanks in advance for your help with these n00b questions. I'm just not very good at Python.

--Miguel

Miguel
  • 91
  • 1
  • 3
  • Your program is not using multithreading. This is important to understand. – juanpa.arrivillaga Aug 20 '21 at 17:44
  • And yes, "is it automatically considered a global variable because it's an attribute of my_globals" that is how it works. Note, in your current setup, `global global_var` *does nothing useful*. It would be treated like a global variable anyway. – juanpa.arrivillaga Aug 20 '21 at 17:47
  • 1
    In any case, I suspect you are on windows, which uses spawn. The point I made above, that this is *multiprocessing* is important - you are **creating a new, separate Python process**. So when you do `procs = Process(target=workerThread, args=())`; proc.start()`, the current module is *loaded again in a new Python process*, and since it isn't `__main__`, it never reaches `global_var = 999` – juanpa.arrivillaga Aug 20 '21 at 17:48
  • Sorry I meant to say "multiprocessing" and used that term in most places. I am using Windows, but I guess I don't understand how spawning works. I would have thought that it would only run the function you are calling (workerThread) not the whole module. So if I want the program to run on a Windows machine, how do I do it? – Miguel Aug 20 '21 at 18:54
  • What you are trying to *is not trivial*, sharing mutable state across subprocesses is *hard*, especially if you want to do it right. You should [read the docs about sharing state with multiprocessing](https://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes) – juanpa.arrivillaga Aug 20 '21 at 22:17

3 Answers3

1

When an operating system uses spawn to create new processes, such as Windows does, it creates a new address space and launches a new Python interpreter that starts executing (interpreting) your source program from the very first line of code. This means that any executable statement at global scope is what will get executed. However, in this new process, variable __name__ will no longer be __main__. This is why you put code that creates new processes in code that is controlled by a block that begins with if __name__ == '__main__': for if you did not do that, the program would get into an infinite, recursive loop attempting to create new processes.

Let's look at your code a bit closer. Your main process executes global_var = 999 right before it creates the new process. But as I just described, when the new process starts executing, it starts running in a new address space that inherits nothing from the main process and all the global definitions in the source file are executed before your function workerThread gets invoked. One of those global definitions is global_var = 777. The other assignment to global_var does not get executed because __name__ is no longer '__main__'. And that is why you see the results you do.

You need to move the assigment global_var = 999 outside of the if statement:

global_var = 999
if __name__ == '__main__':
    ... # etc.

But remember: your subprocess is working on a copy of this global. If workerThread modifies the global running as a subprocess, this change will not be reflected back in the main process's copy.

Booboo
  • 38,656
  • 3
  • 37
  • 60
  • Yes, although, it is important to point out, even on linux where you may be able to use a forked subprocess, it is still a *independent process*, it's just that the memory is inherited on a copy-on-write fashion. Practically speaking, copy-on-write is just a copy when referenced in CPython, due to reference counting mechanisms – juanpa.arrivillaga Aug 20 '21 at 22:03
  • Thanks for your replies. If the respawned program starts from the first line of code, what makes it go into workerThread() at all? If I comment out everything below the `if __name__ == '__main__':` statement (which wouldn't be executed in the spawned version because as @juanpa.arrivillaga pointed out, it's not `__main__`), the program literally does nothing because workerThread() is never executed. So what exactly is executed when you spawn? Why does it execute workerThread() and not other functions? Sorry about the n00b question, but I'm struggling to understand Python's execution model. – Miguel Aug 20 '21 at 22:05
  • @Miguel it's a subprocess, it always is even if you use fork, it's just that if you forked it would have just inherited the memory into the new process (copy-on-write). In both cases, it is important to understand, you have two different processes with two different variables. – juanpa.arrivillaga Aug 20 '21 at 22:11
  • @Miguel and this really doesn't have anything to do with *Python's* execution model, a new process is spawned, it loads the module. The multiprocessing machinery handles interprocess communication, basically by pickling objects and sending them across the wire, the subprocesses accept jobs from the main process and executes them on demand. – juanpa.arrivillaga Aug 20 '21 at 22:16
  • Yes, I now understand that. They are different memory spaces, essentially. However, following @Booboo's example I moved the global_var = 999 outside the `if __name__ == '__main__':` and that worked. But I don't understand why it worked. Why is all the code outside the workerThread() function being executed, instead of just what is inside workerThread()? Does it execute all that first, and then execute workerThread()? How does it know to execute workerThread() and not some other arbitrary function I might have in the file (I tried that too)? – Miguel Aug 20 '21 at 22:21
  • But why is the stuff in the "global scope" executed when it spawns, and not just the stuff inside the workerThread function? And is everything in this global scope executed before the workerThread function, even if it is after the function? And how does it know which function to execute? Sorry, I don't understand how Python is executing this (I'm calling that the "execution model", maybe that's the wrong term here). – Miguel Aug 20 '21 at 22:59
  • The fact that the source is executed from the top is a byproduct of how a new process is spawned and not the principal mechanism, which is that the `Process` instance is serialized to the new address spaced along with copies of open file descriptors and other resources the main process has and that needs to be duplicated. Needless to say, the complete spawning process is quite complicated. – Booboo Aug 21 '21 at 11:20
0

I think the easiest way to use global variables for child processes in Window, is, delete the if __name__ == '__main__': row. Everything should work fine.

Leonardo
  • 1
  • 1
  • 1
-1

Try this,

from multiprocessing import Process
from my_globals import*

def workerThread():
    global global_var
    
    my_file = open("out_file",'w')
    print(global_var, file=my_file)
    my_file.close()
    
    return


if __name__ == '__main__':
    global_var = 2

    if (True):
        # use multiprocessing, but launch only one thread
        procs = Process(target=workerThread, args=())
        procs.start()
        procs.join()
    else:
        # standard function call, so these two paths should be equivalent
        workerThread()

It works for me.

haha2567
  • 7
  • 2
  • This doesn't work for me. The code runs, but it outputs the original global_var value when using multiprocessing instead of 2. So that's not right. I'm running this on Windows, so as @juanpa.arrivillaga suggested maybe this is what is causing the problem? If so, how do I fix it so that it will run on Windows? I need to use global vars for certain things because there are many files and lots of functions and it is easier to put global_vars in a few places rather than pass new arguments through the whole program structure. Any advice would be appreciated. – Miguel Aug 20 '21 at 18:50