Optimization of importing modules in Python

Question

I am reading David Beazley's Python Reference book and he makes a point:

For example, if you were performing a lot of square root operations, it is faster to use 'from math import sqrt' and 'sqrt(x)' rather than typing 'math.sqrt(x)'.

and:

For calculations involving heavy use of methods or module lookups, it is almost always better to eliminate the attribute lookup by putting the operation you want to perform into a local variable first.

I decided to try it out:

first()

def first():
    from collections import defaultdict
    x = defaultdict(list)

second()

def second():
    import collections
    x = collections.defaultdict(list)

The results were:

2.15461492538
1.39850616455

Optimizations such as these probably don't matter to me. But I am curious as to why the opposite of what Beazley has written comes out to be true. And note that there is a difference of 1 second, which is singificant given the task is trivial.

Why is this happening?

UPDATE:

I am getting the timings like:

print timeit('first()', 'from __main__ import first');
print timeit('second()', 'from __main__ import second');

Give the full listings of all programs used in your benchmarks. Otherwise we just have to guess which, although mildly diverting, is not productive. — David Heffernan, May 08 '11 at 07:22
@David: This is the full program. I don't know what else should I give. — user225312, May 08 '11 at 07:23
@David: No I have not. Why would I do that? There is nothing more than this. Except `from timeit import timeit`. What else would be there in this program? You are most welcome to try it out on your computer. — user225312, May 08 '11 at 07:39
Because here on SO we regularly encounter benchmarking questions with crucial details omitted because the OP did not think they were crucial. — David Heffernan, May 08 '11 at 08:13
Also, it makes it easier for others to try out your benchmarks, and anything that makes your answerers' jobs easier is something you should do. — Douglas Leeder, May 08 '11 at 12:45

Douglas Leeder · Accepted Answer · 2011-05-08T12:54:36.503

6

The from collections import defaultdict and import collections should be outside the iterated timing loops, since you won't repeat doing them.

I guess that the from syntax has to do more work that the import syntax.

Using this test code:

#!/usr/bin/env python

import timeit

from collections import defaultdict
import collections

def first():
    from collections import defaultdict
    x = defaultdict(list)

def firstwithout():
    x = defaultdict(list)

def second():
    import collections
    x = collections.defaultdict(list)

def secondwithout():
    x = collections.defaultdict(list)

print "first with import",timeit.timeit('first()', 'from __main__ import first');
print "second with import",timeit.timeit('second()', 'from __main__ import second');

print "first without import",timeit.timeit('firstwithout()', 'from __main__ import firstwithout');
print "second without import",timeit.timeit('secondwithout()', 'from __main__ import secondwithout');

I get results:

first with import 1.61359190941
second with import 1.02904295921
first without import 0.344709157944
second without import 0.449721097946

Which shows how much the repeated imports cost.

edited May 08 '11 at 12:54

answered May 08 '11 at 07:14

Douglas Leeder

52,368
9
94
137

If I place them outside, it defeats the entire purpose of my question! – user225312 May 08 '11 at 07:22
Aah, I see what you were trying to mean. – user225312 May 08 '11 at 07:59
1

@A A: You appear to be misunderstanding what Beazley wrote. He wasn't talking about the differences in repeated `import`s. It makes no sense to be arguing about the difference in times due to repeated imports. If you do move the `import`s outside the loop, what he says is true although in this trivial case, `first` is only slightly faster. – Ned Deily May 08 '11 at 08:02
@Ned: Hmm, ok. Which style is preferred then? – user225312 May 08 '11 at 08:15
1

@A A: For `import`s, either `import x` or `from x import y` is generally OK according to the Python Style Guide (PEP 8). As he writes, there may be reasons to prefer the latter when dealing with a large number of calls in loops. Some people frown on the latter due to potential namespace confusion. What is more widely frowned on is `from x import *`. The meta point is that you don't want to be repeatedly `import`ing for no reason. As PEP 8 says: "Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants." – Ned Deily May 08 '11 at 08:37

eat · Answer 2 · 2011-05-08T08:18:42.960

I'll get also similar ratios between first(.) and second(.), only difference is that the timings are in microsecond level.

I don't think that your timings measure anything useful. Try to figure out better test cases!

Update:
FWIW, here is some tests to support David Beazley's point.

import math
from math import sqrt

def first(n= 1000):
    for k in xrange(n):
        x= math.sqrt(9)

def second(n= 1000):
    for k in xrange(n):
        x= sqrt(9)

In []: %timeit first()
1000 loops, best of 3: 266 us per loop
In [: %timeit second()
1000 loops, best of 3: 221 us per loop
In []: 266./ 221
Out[]: 1.2036199095022624

So first() is some 20% slower than second().

Can you please post how are you testing? – user225312 May 08 '11 at 07:58 — user225312, May 08 '11 at 07:58

score 1 · Answer 3 · answered May 08 '11 at 07:14

1

My guess, your test is biased and the second implementation gains from the first one already having loaded the module, or just from having it loaded recently.

How many times did you try it? Did you switch up the order, etc..

answered May 08 '11 at 07:14

Mantas Vidutis

16,376
20
76
92

I've tested it, order doesn't matter. – Roman Bodnarchuk May 08 '11 at 07:34

score 1 · Answer 4 · answered May 08 '11 at 07:14

1

first() doesn't save anything, since the module must still be accessed in order to import the name.

Also, you don't give your timing methodology but given the function names it seems that first() performs the initial import, which is always longer than subsequent imports since the module must be compiled and executed.

answered May 08 '11 at 07:14

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

@Ignacio: I tried what you said, but no. The second approach is still faster. – user225312 May 08 '11 at 07:30
I tried putting `second()` above `first()` so as to let it load the modules. So I let `second()` perform the initial import. – user225312 May 08 '11 at 07:33
Yeah, no. You have to *call* `second()` first, since nothing happens until they're actually called. – Ignacio Vazquez-Abrams May 08 '11 at 07:35
@Ignacio: I isolated the calls and still the same result. – user225312 May 08 '11 at 07:36
"Isolated the calls"? What does that even mean? – Ignacio Vazquez-Abrams May 08 '11 at 07:38
Hehe. I meant that I made two separate files, put `first()` in one and `second()` in the other and then timed them. *Just to be sure*. – user225312 May 08 '11 at 07:39

score 1 · Answer 5 · edited May 23 '17 at 11:55

There is also the question of efficiency of reading/understanding the source code. Here's a real live example (code from a stackoverflow question)

Original:

import math

def midpoint(p1, p2):
   lat1, lat2 = math.radians(p1[0]), math.radians(p2[0])
   lon1, lon2 = math.radians(p1[1]), math.radians(p2[1])
   dlon = lon2 - lon1
   dx = math.cos(lat2) * math.cos(dlon)
   dy = math.cos(lat2) * math.sin(dlon)
   lat3 = math.atan2(math.sin(lat1) + math.sin(lat2), math.sqrt((math.cos(lat1) + dx) * (math.cos(lat1) + dx) + dy * dy))
   lon3 = lon1 + math.atan2(dy, math.cos(lat1) + dx)
   return(math.degrees(lat3), math.degrees(lon3))

Alternative:

from math import radians, degrees, sin, cos, atan2, sqrt

def midpoint(p1, p2):
   lat1, lat2 = radians(p1[0]), radians(p2[0])
   lon1, lon2 = radians(p1[1]), radians(p2[1])
   dlon = lon2 - lon1
   dx = cos(lat2) * cos(dlon)
   dy = cos(lat2) * sin(dlon)
   lat3 = atan2(sin(lat1) + sin(lat2), sqrt((cos(lat1) + dx) * (cos(lat1) + dx) + dy * dy))
   lon3 = lon1 + atan2(dy, cos(lat1) + dx)
   return(degrees(lat3), degrees(lon3))

score 0 · Answer 6 · answered May 08 '11 at 23:25

Write your code as usual, importing a module and referencing its modules and constants as module.attribute. Then, either prefix your functions with the decorator for binding constants or bind all modules in your program by using the bind_all_modules function below:

def bind_all_modules():
    from sys import modules
    from types import ModuleType
    for name, module in modules.iteritems():
        if isinstance(module, ModuleType):
            bind_all(module)

def bind_all(mc, builtin_only=False, stoplist=[],  verbose=False):
    """Recursively apply constant binding to functions in a module or class.

    Use as the last line of the module (after everything is defined, but
    before test code).  In modules that need modifiable globals, set
    builtin_only to True.

    """
    try:
        d = vars(mc)
    except TypeError:
        return
    for k, v in d.items():
        if type(v) is FunctionType:
            newv = _make_constants(v, builtin_only, stoplist,  verbose)
            try: setattr(mc, k, newv)
            except AttributeError: pass
        elif type(v) in (type, ClassType):
            bind_all(v, builtin_only, stoplist, verbose)

Optimization of importing modules in Python

6 Answers6