Why do Python modules altered during execution persist over separate files?

Question

Sorry for confusing title, let me explain what I mean. I came across a piece of code similar to the following using Google's PrettyTensor API, where it allows for custom functions to be added to the PrettyTensor class through its @prettytensor.Register() decorator.

(located in custom_ops.py)

import prettytensor as pt

@pt.Register(...)
def custom_foo(bar):
    ...

(located in main.py)

import prettytensor as pt
import custom_ops

x = pt.custom_foo(bar)

This code accesses prettytensor through 2 separate files, and I don't understand why the changes made in one file carry over to the other. What's also interesting is that the order of the imports doesn't matter.

import custom_ops
import prettytensor as pt

x = pt.custom_foo(bar)

The code above still works fine. I would like help finding an explanation for this phenomenon, as I could not find documentation for it anywhere. It seems to me like the python interpreter is caching the module in memory, and when it is altered by the custom_ops file it persists in the interpreter when it is imported again. If anyone knows why this happens, how would you stop it from occurring?

I'm sorry, I don't understand what your question is. What exactly is the unexpected behavior? — juanpa.arrivillaga, Jul 10 '17 at 22:17
modules are stored globally in `sys.modules`, imports in separate modules refer identically to the same module object and thus modifications are persisted — anthony sottile, Jul 10 '17 at 22:21
@AnthonySottile yes, to add to that, it isn't helpful to think of modules as "files". The files are source-code, which when executed create module-objects. These objects exist in memory, not in the `.py` files. — juanpa.arrivillaga, Jul 10 '17 at 22:24
@juanpa.arrivillaga right, but I would think that the expected behavior is for the interpreter to separate the global modules across files, or scopes i guess, so that one does not affect the other. — TheCoolManz, Jul 10 '17 at 22:32
No. There is *only one module*. The whole point of a module is to act as a namespace. You may want to create a class, and use different instances of that class in different modules. A module object is essentially like a static class. Again, it is not helpful to think of your program in terms of "files". Files are just places where your source code is stored. — juanpa.arrivillaga, Jul 10 '17 at 22:35
I'm not really sure what you even mean by "seperate the global modules across files or scopes". In any event, the behavior you see *is what you want*. You wouldn't want 20+ copies of the `sys` or `os` or whatever module floating around in memory, all with different and conflicting state. That would be a mess. It sounds like you need to be using a different construct, like a class. — juanpa.arrivillaga, Jul 10 '17 at 22:38
@juanpa.arrivillaga I understand what you are saying, but since the namespaces in different (shall I use the dreaded word?) files do not conflict with each other usually (they are separate global scopes), I expected that instances of a module would also not conflict with each other. I suppose I was wrong, and I understand your explanation. Thanks for your help. — TheCoolManz, Jul 10 '17 at 22:51
@juanpa.arrivillaga also, since I can think of a scenario or two where this behavior is not the one you would want, (e.g. sys.stdout or sys.stderr have been cleared in one module but are necessary in another module), do you know how to prevent this from happening? — TheCoolManz, Jul 10 '17 at 22:53
Yes, but you are accessing the module namespace inside your file's namespace... do you see the distinction? So, when I do `import some_module`, the current global namespace sees `some_module`, but that is just a the same namespace any other file that does `import some_module`. — juanpa.arrivillaga, Jul 10 '17 at 22:54
Don't clear `sys.stdout`? You shouldn't be doing that anyway. Redirect it or something. Essentially, these are file-handlers accesible through the `sys` namespace. You can always open a new file. — juanpa.arrivillaga, Jul 10 '17 at 22:54

Blckknght · Answer 1 · 2017-07-10T23:34:09.723

The reason both your modules see the same version of the prettytensor module is that Python caches the module objects it creates when it loads a module for the first time. The same module module object can then be imported any number of times in different places (or even several times within the same module, if you had a reason to do that), without being reloaded from its file.

You can see all the modules that have been loaded in the dictionary sys.modules. Whenever you do an import of a module that's already been loaded, Python will see it in sys.modules and you'll get a reference to the module object that already exists instead of a new module loaded from the .py file.

In general, this is what you want. It's usually a very bad thing if two different parts of the code can get a reference to a module loaded from the same file via two different module names. For instance, you can have two objects that both claim to be instances of class foo.Foo, but they could be instances of two different foo.Foo classes if foo can be accessed two different ways. This can make debugging a real nightmare.

Duplicated modules can happen if your Python module search path is messed up (so that the modules inside a package are also exposed at the top level). It can also happen with the __main__ module (created from the file you're running as a script), which can also be imported using its normal name (e.g. main in your example with main.py).

You can also manually reload a module using the reload function. In Python 2 this was a builtin, but it's stashed away in importlib now in Python 3.

Thanks! That helps a lot! – TheCoolManz Jul 11 '17 at 04:17 — TheCoolManz, Jul 11 '17 at 04:17

Why do Python modules altered during execution persist over separate files?

1 Answers1