Rust pyo3 function is faster in python when ran multiple times instead of once

Question

I implemented an algorithm in rust for speed, which is then built into a python module. Running the function is indeed much faster than the python implementation. But I noticed an interesting quirk: Running the function a lot of times (say, 1 million) is on average much faster than just running the function once or a few times.

print(timeit.timeit(lambda: blob(9999, 16, (3, 5), (4, 11), (2, 4)), number=1))
print(timeit.timeit(lambda: blob(9999, 16, (3, 5), (4, 11), (2, 4)), number=1000000) / 1000000)

Output:

1.5100000382517464e-05
2.1137116999998398e-06

As you can see, the function being executed 1000000 times is on average, is about 7 times faster than only running it once.

Any idea why this is happening? Any help would be appreciated.

If the code of the rust function is needed to pinpoint the problem, just send a comment and I'll put it here :)

`timeit` is basically worthless with `number=1`, since you end up measuring its overhead too. Use `timeit.autorange()` if you want timeit to figure out a "suitable" number for your function. — AKX, Aug 26 '22 at 14:10
In other words: I don't think it's the function being faster or slower, it's your measurement methodology screwing things up. — AKX, Aug 26 '22 at 14:11
imagine if a function takes 3 seconds the first time due to processing overhead, but then caches the info, so that subsequent times it takes sub-second times. Therefore when you take average of 1000 runs for example, you will likely get sub-1 second time because func runs faster after the initial time. that could likely explain the phenomenon that you were noticing in this case. — rv.kvetch, Aug 26 '22 at 14:16
Ah... I ran timeit and also profiled the function, both giving this result so I just concluding it's doing something weird. Now that I'm timing it with just plain time.time(), the more the slower. Thanks for pointing that out, sometimes my stupidity knows no bounds — DaNubCoding, Aug 26 '22 at 14:33
Also, be aware that what AKX said is basically true for all benchmarking / profiling methods. That's why you always take an average over a lot (whatever that means) of samples, and sometimes you also run the function a few times beforehand without taking it into account (for caching issue, as rv.kvetch said). — jthulhu, Aug 26 '22 at 14:52

score 1 · Accepted Answer · answered Aug 29 '22 at 06:47

The easiest method to measure something (forget about cache issues) it's to measure time diffs in some unit time (smaller unit, more precision).

let start = Instant::now();
f();
let duration = start.elapsed();

But, as others pointed out, cache happens at multiple levels. When you run the same execution multiple times, there's always a window for the processes that are dedicated to live optizimations to cache data, and provide faster execution times.

Note how pyo3 finally ends interacting with C code in the Python's side, which already has optimizations (even intermediate compiled units, like .pyc files) at really lower levels to boost processes that are execution intense (specially repetibles ones) applying different techniques, and of course, caching techniques to save time and space.

Rust pyo3 function is faster in python when ran multiple times instead of once

1 Answers1