Accurately testing Pypy vs CPython performance

Question

The Problem Description:

I have this custom "checksum" function:

NORMALIZER = 0x10000


def get_checksum(part1, part2, salt="trailing"):
    """Returns a checksum of two strings."""

    combined_string = part1 + part2 + " " + salt if part2 != "***" else part1
    ords = [ord(x) for x in combined_string]

    checksum = ords[0]  # initial value

    # TODO: document the logic behind the checksum calculations
    iterator = zip(ords[1:], ords)
    checksum += sum(x + 2 * y if counter % 2 else x * y
                    for counter, (x, y) in enumerate(iterator))
    checksum %= NORMALIZER

    return checksum

Which I want to test on both Python3.6 and PyPy performance-wise. I'd like to see if the function would perform better on PyPy, but I'm not completely sure, what is the most reliable and clean way to do it.

What I've tried and the Question:

Currently, I'm using timeit for both:

$ python3.6 -mtimeit -s "from test import get_checksum" "get_checksum('test1' * 100000, 'test2' * 100000)"
10 loops, best of 3: 329 msec per loop

$ pypy -mtimeit -s "from test import get_checksum" "get_checksum('test1' * 100000, 'test2' * 100000)"
10 loops, best of 3: 104 msec per loop

My concern is I'm not absolutely sure if timeit is the right tool for the job on PyPy because of the potential JIT warmup overhead.

Plus, the PyPy itself reports the following before reporting the test results:

WARNING: timeit is a very unreliable tool. use perf or something else for real measurements
pypy -m pip install perf
pypy -m perf timeit -s 'from test import get_checksum' "get_checksum('test1' * 1000000, 'test2' * 1000000)"

What would be the best and most accurate approach to test the same exact function performance across these and potentially other Python implementations?

Does that test (time) anything at all? It seems you only perform a setup and no real test command? — MSeifert, Feb 07 '17 at 17:25
@MSeifert ah, I am an idiot, you are absolutely right. There was only set up there, I've updated the answer leaving the latter part of the question. Thanks! — alecxe, Feb 07 '17 at 17:39

score 3 · Accepted Answer · answered Mar 30 '17 at 13:06

3

You could increase the number of repetitions with the --repeat parameter in order to improve timing accuracy. see:

https://docs.python.org/2/library/timeit.html

answered Mar 30 '17 at 13:06

Haroldo_OK

6,612
3
43
80

score 2 · Answer 2 · answered Apr 06 '17 at 10:56

It is not entirely clear what you are trying to measure. "Performance" can mean a variety of things depending on your use-case.

Are you trying to measure raw speed of the function once everything is warmed up (JIT in particular but also library imports, file loading, etc...)? Then you probably want to --repeat a lot like Haroldo_OK suggested. With enough repetitions, the time spent in other parts of your code would become progressively "insignificant".
are you measuring things for the sake of learning or for a real world use case? If the latter, it is probably a good idea to test your code under similar conditions (length of the strings you're passing to your function, number of iterations, warm/cold calling of your code...). My impression is that using the python interface instead of the CLI would give you more flexibility to measure exactly what you're after.

Of note, timeit turns off garbage collection, so if you're looking for "real world" measurements, maybe you want to turn it back on (see the link for how to do it).

If you're trying to improve the speed, using a profiler like cProfile which is supported by both Python3.6 and pypy could help with isolating the code whose speed you want to measure?

I'm not actually answering your question, but I hope it helps :)

Accurately testing Pypy vs CPython performance

2 Answers2