How can I prevent top-level code from running on import?

Question

I want to test a Python module, which in turn depends on another Python module, which in turn is badly behaved: it doesn't check for __name__ but always runs some code. This code accesses the file system and hence gets in the way of unit testing.

Here is a simplified version of the problem that just prints something:

test.py:

from unittest import TestCase
import tested

class MyTest(TestCase):
    def test_one(self, mocked_dep):
        self.assertEqual(1, tested.one())

if __name__ == '__main__':
    import unittest
    unittest.main()

The code under test, tested.py:

import dep

def one():
    return 1

The badly behaved dependency, dep.py:

def do_stuff():
    # access file system, open network connection, go to database
    print("I got called!")

do_stuff()

When I run this, I see the printed message.

I got called!
.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK

How can I prevent do_stuff from running?

I tried mocking it, but by the time my @patch is evaluated with the import tested in the beginning of test.py, the module probably has already been loaded and do_stuff has been called.

I also tried to import tested after the @patch inside the test (removing the import tested on line 3), but that still called do_stuff and printed its output.

from unittest import TestCase
from mock import patch

class MyTest(TestCase):
    @patch("tested.dep.do_stuff")     # <-- too late? still prints
    def test_one(self, mocked_dep):
        import tested                 # <-- moving the import here did not help either
        self.assertEqual(1, tested.one())

I cannot change the behavior of dep.py as too much other code may depend on it.

How can I prevent do_stuff from getting called? Or is my only option to mock the functions I don't want do_stuff to call, in this case print?

Why do you call `do_stuff` in the module? Modules are not supposed to have any top-level code. — nerdguy, Feb 04 '20 at 23:05
@nerdguy: It's perfectly OK for a module to have top-level code if it needs it. For example, it may need to initialize some things before it can be used. — martineau, Feb 04 '20 at 23:39
See [mocking a module import](https://stackoverflow.com/q/43162722/674039). — wim, Feb 05 '20 at 00:44
@martineau That is not "perfectly OK". It's a code smell (mutable global state), that is generally avoidable in the first place. — wim, Feb 05 '20 at 00:45
@wim: Who said that the global state was *mutable*? It could just be a large table that’s better expressed as code than as a giant literal (and even assigning a literal—or function!—is implemented as “top-level code”). — Davis Herring, Feb 05 '20 at 03:47
Does your `tested.py` need to call into `dep` during the test, or is that `import` only needed for other situations? — Davis Herring, Feb 05 '20 at 03:50
Regardless, that's avoidable too (initialized explicitly, or initialized lazily on first use) — wim, Feb 05 '20 at 04:13
@DavisHerring I want to eventually use code in `dep.py` from `tested.py`, I just don't want that `do_stuff` to run. — Robert, Feb 05 '20 at 16:45
@wim: Explicit module initialization is even worse, since it makes clients that are themselves libraries guess whether to invoke it or not. Lazy initialization has a cost on every access. The simple answer is that it’s mutability and external interactions that are bad, not *execution* per se. — Davis Herring, Feb 05 '20 at 17:43
@DavisHerring I get the feeling you are not very familiar with Python. Explicit initialization would often be done with a context manager, no guesswork there, the init is explicitly part of the API in the first place (with "entering context"). And lazy initialization does not have a cost on _every_ access - it only moves the once-off init from time of import to time of first use. — wim, Feb 05 '20 at 17:58
@wim: Sounds to me like maybe you're the one not that familiar with Python or at least some of its common idioms. Modules are very similar to singleton classes, and it's perfectly OK for one to initialize itself when the class statement is executed (which might not happen until it's first imported). — martineau, Feb 05 '20 at 19:54
@martineau No, it is not okay to "_open network connection, go to database_" (from OP) at import time, because eagerly executing such code will be problematic in testing and deployment (exactly as this question is showing). The place where those network connection should go or what the database credentials are should be configurable, *after* importing the code. If you don't understand the difference between a **script** and a **module** then that's your own lack of experience speaking. You'll probably learn this the hard way one day, after accidentally talking to a prod db from a dev box. — wim, Feb 05 '20 at 20:49
@DavisHerring I think we are probably on the same page, actually, but you may have missed that the O.P. actually _is_ talking about external interactions triggered at import time, not just some harmless top-level setup code. See the comment under "do_stuff" function. — wim, Feb 05 '20 at 20:59
@wim: Of course I don’t dispute that there are misuses of executing code on import; I was merely responding (indirectly) to nerdguy’s statement that (to paraphrase) there aren’t any *legitimate* uses of the feature. Surely this question arises from one of the bad uses! (By “explicit initialization” I meant something done once for the module, not once per use (as with `with`), and lazy initialization has to *check* whether it has happened every time.) — Davis Herring, Feb 06 '20 at 01:40
@DavisHerring In a dynamic language such as Python, it is simply **not true that lazy init has to check whether it has happened every time**. [Here](https://ideone.com/uemaRd) is an example using the descriptor protocol to avoid such a check. And [here](https://ideone.com/6tMGLF) is an example of doing the same thing in module scope. *These datamodel hooks are only invoked if an attr is not found by the usual means*. After init, the computed val is subsequently found the usual way, and there's no overhead on the lazy attr compared to looking up a regular attr (it *becomes* a normal attribute). — wim, Feb 06 '20 at 02:02
@wim: Fair enough, although the `__getattr__` is a bit ugly (and requires 3.7, and doesn’t work on code *in* the module) and the non-data descriptor is (microscopically) slower because the interpreter has to verify that it *is* a non-data descriptor (*i.e.*, look for `__set__`/`__delete__` or at least a flag left from a previous check). — Davis Herring, Feb 06 '20 at 02:12
@DavisHerring I've benchmarked and [there was nothing in it](http://dpaste.com/19XZMJS), same within margin of error. The interpreter must check for a descriptor, even with the normal attribute, it seems to be the same to check if there's anything at all there vs finding a non-data descriptor. You could delete the data descriptor out of the type, too, making the code path for the interpreter exactly the same. Maybe in other languages "lazy initialization has a cost on every access", but that doesn't seem true in Python (or rather, in Python you are paying those nanoseconds regardless) — wim, Feb 06 '20 at 04:47

How can I prevent top-level code from running on import?

0 Answers0