1

I've just stumbled across something that makes no sense to me. Where I work, we have a number of Python CGI webpages (just a simple Apache server setup, not running Django / Turbogears or the like) and I've been getting a bit frustrated with how long it takes the scripts to run. I chucked lots of time.time() calls and thought I'd identified the bottleneck as the import of sqlalchemy (though I now think it's probably "any big module" so the sqlalchemy tag is perhaps misplaced).

So, after trying various different things, I ended up with this example, (assume the file's called 'test.py')

#!/usr/bin/python

import time
t1 = time.time()
import sqlalchemy
print time.time() - t1

If I run test.py at the command prompt (by setting it executable), it typically shows about 0.7 seconds (+/- 0.1 seconds) for that import statement.

But, if I call

python -c "execfile('test.py')"

I get a speed up of about a factor of 10

So I thought I'd wrap some of my python CGI scripts with a little tcsh script that calls

python -c "execfile('mypythoncgiscript.py')"

and I get speed-ups typically about a factor of 2-3, and, importantly, the data returned is still correct.

With a cpu-heavy rather than import-heavy script, e.g:

t1 = time().time()
a = 0
for i in xrange(10000000):
    a += 1
print time.time() - t1

I get a very slight slowdown using execfile, which is what I would have expected from the slight execfile overhead.

Does anyone know what's going on here? Can anyone reproduce similar speed differences or is my setup broken in a way that execfile somehow fixes? I thought imports behaved slightly differently within execfile (or at least, aren't necessarily visible once you've left the execfile statement) but I'm surprised by such a large difference in speed.

I'm running python 2.4 64bit on Oracle-supplied "Enterprise Linux Server release 5 (Carthage)".

FredL
  • 961
  • 7
  • 13
  • I cant reproduce it with python 2.6/2.7/3.1 under windows with any big module I've got (don't have sqlalchemy installed). – bdew Dec 23 '10 at 10:31
  • Some pointers to check: Is "/usr/bin/python" and "python" are really the same interpretter? Is SqlAlchemy doing some shenanigans on either the `__main__` module or it's globals/locals (which would behave differently from execfile()) ? – bdew Dec 23 '10 at 10:38
  • Yup, they're both pointing to the same executable - I'm beginning to think it must be something wrong with our setup as you and WoLpH can't reproduce it and if it were general I can't believe I'm the first to find it! (Almost a second to import sqalchemy on a pretty well-specced (and currently not very heavily loaded) server does seem ridiculously slow). I'm going to play with strace and see if there's any difference there. – FredL Dec 23 '10 at 11:45
  • 1
    @FredL: can you check where you're getting SQLAlchemy from? Perhaps it's a different install or something? `import sqlalchemy; print sqlalchemy.__file__` – Wolph Dec 23 '10 at 13:07
  • 1
    `python -vv` will print how the imports were resolved. There may also be some funny business where you may have write permission issues for the `.pyc` files. – kevpie Dec 24 '10 at 00:33
  • So, I ran python -vv for the two versions and diffed the output. The only difference was when run using python -c "some command" it looked up import files using relative paths, whereas python test.py it looks up using absolute paths - this led me to try moving the scripts to a local drive, and now, it runs super-quick either way, so it must be some odd NFS issues. Thanks @kevpie and @WoLpH for your help in tracking it down (when I have enough points I'll up-vote your comments :) ) – FredL Jan 04 '11 at 11:40
  • @FredL, great find. Is sqlalchemy a compressed (zip) module or is it uncompressed? Dealing with one compressed file vs, a bunch that are tiny (smaller than network packet) should have significantly less i/o overhead over NFS. Also if your network is having abnormally HIGH dropped packets for UDP it could be aggravating the small file lookup/transfers. – kevpie Jan 04 '11 at 12:53

1 Answers1

0

My guess is that there is no real difference. It only looks like a big difference. Try and test it like this to be sure:

# time python test.py
0.0514879226685
python test.py  0.06s user 0.01s system 95% cpu 0.071 total

# time python -c 'execfile
0.0515019893646
python -c 'execfile("test.py")'  0.06s user 0.01s system 95% cpu 0.071 total
Wolph
  • 78,177
  • 11
  • 137
  • 148
  • I was thinking something similar myself, but you can measure the difference in response time using firebug (for an actual CGI page rather than just this script) and it makes a genuine difference. When I use the time command like you suggest it does show a much lower CPU usage for the direct method vs the execfile method (~35% vs ~75%). – FredL Dec 23 '10 at 11:33