34

I am wondering about the %timeit command in IPython

From the docs:

%timeit [-n<N> -r<R> [-t|-c] -q -p<P> -o] setup_code

Options:

-n: execute the given statement times in a loop. If this value is not given, a fitting value is chosen.

-r: repeat the loop iteration times and take the best result. Default: 3

For example, if I write:

%timeit -n 250 -r 2 [i+1 for i in range(5000)]

So, -n 250 executes [i+1 for i in range(5000)] 250 times? Then what does -r 2?

MSeifert
  • 145,886
  • 38
  • 333
  • 352
bner341
  • 525
  • 1
  • 7
  • 8
  • It does two runs of 250. – pvg Sep 05 '17 at 00:29
  • 4
    Why run twice the 250 runs? I didn't understand logic behind why these options are provided. – bner341 Sep 05 '17 at 00:33
  • What is unclear? – pvg Sep 05 '17 at 00:45
  • @bner341 After reading this a while (and MSeiferts link, which is very detailed), I think the most straight forward answer is that you need r for the the std dev. If r is 1, you only get the average run time (total time / n), and the std dev is 0. If r > 1, you still get the average run time (but now it is total time / (n*r)), but you also get the std dev of r1, r2, r3, r4, where r1 = run 1 average rune time = total time of run 1 / n; r2 is the same, etc – scott.se Jan 28 '23 at 03:08

3 Answers3

28

It specifies the number of repeats, the number of repeats are used to determine the average. For example:

%timeit -n 250 a = 2
# 61.9 ns ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 250 loops each)

%timeit -n 250 -r 2 a = 2
# 62.6 ns ± 0 ns per loop (mean ± std. dev. of 2 runs, 250 loops each)

The number of executions will be n * r but the statistic is based on the number of repeats (r) but the number of "loops" for each repeat is determined based on the number (n).

Basically you need a large enough n so the minimum of the number of loops is accurate "enough" to represent the fastest possible execution time, but you also need a large enough r to get accurate "statistics" on how trustworthy that "fastest possible execution time" measurement is (especially if you suspect that some caching could be happening).

For superficial timings you should always use an r of 3, 5 or 7 (in most cases that's large enough) and choose n as high as possible - but not too high, you probably want it to finish in a reasonable time :-)

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • 1
    I come back to this answer every few months and I still have no idea what `r` is for, that's too vague. – bwdm May 17 '20 at 20:47
  • 2
    @bwdm I answered a similar question in more detail [here](https://stackoverflow.com/a/59543135/5393381). Let me know if that's less vague. :) – MSeifert May 18 '20 at 20:02
10
timeit -n 250 <statement>

The statement will get executed 3 * 250 = 750 times (-r has a default value of 3)

timeit -n 250 -r 4 <statement>

The statement will get executed 4 * 250 = 1000 times

-r - how many times to repeat the timer (in the examples above, each time the timer is called with -n 250 which means 250 executions)

Nir Alfasi
  • 53,191
  • 11
  • 86
  • 129
0

A more statistical way of explaining is as the bootstrapping estimation of the distribution of some statistics (specifically, its mean and standard deviation), in such context: "r" can be seen as the number of samples and "n" as the size of each sample.

  • 1
    Are you implying that taking a standard deviation of the samples (each consisting of n runs) would yield a more accurate estimate of the standard deviation than just taking a sample standard deviation over all (nr) of the runs? Otherwise I don't see why one would want to split the results into r samples, rather than just basing the inference on all nr runs. It seems to me that the real reason is [the resolution of the timer itself](https://stackoverflow.com/questions/48258008/n-and-r-arguments-to-ipythons-timeit-magic/59543135#59543135), but let me know there's another, statistical reason. – Dahn Jul 05 '20 at 17:17