2

I am learning how to use the %timeit magic command in IPython, actually using Jupyter notebook with Python 3. If I try to time the NumPy sorting function for various sized arrays:

n = 10
for i in range(n):
    arr = np.random.rand(2**(i+10))
    %timeit -n 2 np.sort(arr)

Then I get a sequence of roughly increasing times, like I would expect.

If I try to pack this code into a function, however, I do not get the output I expect: all of the times are about the same!

def my_func(n):
    for i in range(n):
        arr = np.random.rand(2**(i+10))
        %timeit -n 10 np.sort(arr)
my_func(10)

Please see the Jupyter notebook showing the results here.

Can anyone explain either what I am doing wrong, or what I am misunderstanding?

MSeifert
  • 145,886
  • 38
  • 333
  • 352
RinRisson
  • 133
  • 1
  • 7
  • `%timeit` is a special syntax Jupyter supports, it’s not actually valid Python code. So I would expect that Jupyter parses this separately and it has an effect on the whole executed command. Try using the `timeit` module directly instead. – poke Sep 03 '17 at 17:04
  • In the second case, you're just repeatedly sorting the global `arr` you made in the first cell, not the local arr. Change the local variable name to something else and you'll see what %timeit is complaining about. – pvg Sep 03 '17 at 17:30

1 Answers1

1

%timeit isn't supposed to work correctly inside functions (currently). If you start a fresh notebook (or restart yours) and only use:

import numpy as np
def my_func(n):
    for i in range(n):
        arr = np.random.rand(2**(i+10))
        %timeit -n 10 np.sort(arr)

my_func(10)

It will throw a NameError:

NameError: name 'arr' is not defined

That's because %timeit only inspects the global variables not the local ones (so it ignores the variable arr = np.random.rand(2**(i+10)) defined inside your function).

If you use this code it will be obvious:

import numpy as np

arr = np.array([1, 2, 3])

def my_func(n):
    for i in range(n):
        arr = np.random.rand(2**(i+10))
        %timeit -n 2 -r 1 print(arr)

my_func(10)

which prints:

[1 2 3]
[1 2 3]
3.44 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 2 loops each)
[1 2 3]
[1 2 3]
670 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 2 loops each)
[1 2 3]
[1 2 3]
2.04 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 2 loops each)
[1 2 3]
[1 2 3]
451 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 2 loops each)
[1 2 3]
[1 2 3]
906 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 2 loops each)
[1 2 3]
[1 2 3]
1.01 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 2 loops each)
[1 2 3]
[1 2 3]
767 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 2 loops each)
[1 2 3]
[1 2 3]
890 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 2 loops each)
[1 2 3]
[1 2 3]
1.28 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 2 loops each)
[1 2 3]
[1 2 3]
919 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 2 loops each)

So in your case it always found the last arr from your non-function runs (which was global). Which also explains why the time was roughly identical for the function. Because it always found the same arr.

MSeifert
  • 145,886
  • 38
  • 333
  • 352