I'm running Python 2.7.10 on a 16GB, 2.7GHz i5, OSX 10.11.5 machine.
I've observed this phenomenon many times in many different types of examples, so the example below, though a bit contrived, is representative. It's just what I happened to be working on earlier today when my curiosity finally piqued.
>>> timeit('unicodedata.category(chr)', setup = 'import unicodedata, random; chr=unichr(random.randint(0,50000))', number=100)
3.790855407714844e-05
>>> timeit('unicodedata.category(chr)', setup = 'import unicodedata, random; chr=unichr(random.randint(0,50000))', number=1000)
0.0003371238708496094
>>> timeit('unicodedata.category(chr)', setup = 'import unicodedata, random; chr=unichr(random.randint(0,50000))', number=10000)
0.014712810516357422
>>> timeit('unicodedata.category(chr)', setup = 'import unicodedata, random; chr=unichr(random.randint(0,50000))', number=100000)
0.029777050018310547
>>> timeit('unicodedata.category(chr)', setup = 'import unicodedata, random; chr=unichr(random.randint(0,50000))', number=1000000)
0.21139287948608398
You'll notice that, from 100 to 1000, there's a factor of 10 increase in the time, as expected. However, 1e3 to 1e4, it's more like a factor of 50, and then a factor of 2 from 1e4 to 1e5 (so a total factor of 100 from 1e3 to 1e5, which is expected).
I'd figured that there must be some sort of caching-based optimization going on either in the actual process being timed or in timeit
itself, but I can't quite figure out empirically whether this is the case. The imports don't seem to matter, as can be observed this with a most basic example:
>>> timeit('1==1', number=10000)
0.0005490779876708984
>>> timeit('1==1', number=100000)
0.01579904556274414
>>> timeit('1==1', number=1000000)
0.04653501510620117
where from 1e4 to 1e6 there's a true factor of 1e2 time difference, but the intermediate steps are ~30 and ~3.
I could do more ad hoc data collection but I haven't got a hypothesis in mind at this point.
Any notion as to why the non-linear scale at certain intermediate numbers of runs?