Why is math.factorial much slower in Python 2.x than 3.x?

Question

I get the following results on my machine:

Python 3.2.2 (default, Sep  4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.timeit('factorial(10000)', 'from math import factorial', number=100)
1.9785256226699202
>>>

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.timeit('factorial(10000)', 'from math import factorial', number=100)
9.403801111593792
>>>

I thought this might have something to do with int/long conversion, but factorial(10000L) isn't any faster in 2.7.

10,000! - do you realize just how large that number is? http://gimbo.org.uk/texts/ten_thousand_factorial.txt — duffymo, Mar 22 '12 at 01:31
I'm not trying to explain it. I'm just wondering if the OP is aware, that's all. int/long conversion hardly seems relevant. Where's your answer, isbadawi? — duffymo, Mar 22 '12 at 01:32
Maybe Python 3 is faster than Python 2. This would be an interesting question if it were the other way around. — Greg Hewgill, Mar 22 '12 at 01:33
I'm well aware of how big the number is. I thought that it might be generating `int`s, and then having to re-convert them to multiply, but that didn't explain things. I've seen reports of certain things being faster in 3.x and certain other things being faster in 2.x, but a nearly factor-of-5 difference is, AFAICT, highly unusual. — Karl Knechtel, Mar 22 '12 at 01:38
The Java JVM uses statistics at runtime to optimize. As a result, micro-benchmarks like this can be misleading. It's usually recommended that tests be repeated to get a truer picture after "burn in". Could it be that something similar is at work with Python? — duffymo, Mar 22 '12 at 01:39
@Corbin You called it, totally different algorithm. @duffymo `CPython` doesn't do that kind of thing, though some other implementations do. — agf, Mar 22 '12 at 01:50

score 58 · Accepted Answer · edited Jul 03 '12 at 16:03

58

Python 2 uses the naive factorial algorithm:

1121 for (i=1 ; i<=x ; i++) {
1122     iobj = (PyObject *)PyInt_FromLong(i);
1123     if (iobj == NULL)
1124         goto error;
1125     newresult = PyNumber_Multiply(result, iobj);
1126     Py_DECREF(iobj);
1127     if (newresult == NULL)
1128         goto error;
1129     Py_DECREF(result);
1130     result = newresult;
1131 }

Python 3 uses the divide-and-conquer factorial algorithm:

1229 * factorial(n) is written in the form 2**k * m, with m odd. k and m are
1230 * computed separately, and then combined using a left shift.

See the Python Bugtracker issue for the discussion. Thanks DSM for pointing that out.

edited Jul 03 '12 at 16:03

DSM

342,061
65
592
494

answered Mar 22 '12 at 01:41

agf

171,228
44
289
238

2

Interestingly, and kind of sadly, despite being ostensibly implemented in C, `math.factorial` in Python 2.x doesn't seem too much faster than just using a naive `for` loop in pure Python. The overhead of using Python long integers seems to eat up whatever gains can be had from looping in C. As was commented in the linked Python bugtracker thread, if you really want performance for this kind of thing, use `gmpy`. – John Y Oct 15 '12 at 19:59
@JohnY I'm not sure which implementation you pick is important, beyond the algorithm chosen. It's impossible to get good performance with the naive algorithm, whether you hand code it in assembly or write it in a high level language. – agf Oct 15 '12 at 20:27
@agf: I'm not expecting one naive algorithm to have a better big-O complexity than the same naive algorithm in a different language. I still think it's kind of funny and sad that `math.factorial` doesn't even have much of a constant-factor improvement over the pure-Python naive algorithm. On my PC, it was only a few percent faster. – John Y Oct 16 '12 at 05:48
1

@JohnY How much faster would an equally unoptimized non-Python C implementation of the naive algorithm be? You're assuming it would be much faster, and using that as evidence of poor performance of C-level Python objects, without establishing that. – agf Oct 16 '12 at 07:03
2

@agf: I'm not assuming anything, and I'm not saying C-level Python object performance is poor. I don't even know what an "equally unoptimized" implementation would be, because you have to implement bignums if you want to replicate the full functionality. The thing I am surprised by is the fact that the Python devs decided to include a barely-better-than-pure-Python function in the `math` module, a module which was intended as (and in seemingly all other respects is) a thin wrapper for pure-C routines (which factorial is not). – John Y Oct 16 '12 at 14:27
@agf: I did use language in my earlier comments that imply I'm expecting C to be much faster than Python for "substantially similar" algorithms. The thing is, that is not debated. If you have a function that only deals with machine integers and floats, yes, absolutely the naive C implementation is usually several times (often dozens of times) faster than the equivalently naive pure-Python implementation. But machine integers and floats are very limiting in some situations, factorial being one of them. – John Y Oct 16 '12 at 14:34

Why is math.factorial much slower in Python 2.x than 3.x?

1 Answers1

Linked