How many CPU cycles one addition take?

Question

I want to measure the number of clock cycles it takes to do an addition operation in Python 3.

I wrote a program to calculate the average value of the addition operation:

from timeit import timeit

def test(n):
    for i in range(n):
      1 + 1

if __name__ == '__main__':

    times = {}
    for i in [2 ** n for n in range(10)]:
      t = timeit.timeit("test(%d)" % i, setup="from __main__ import test", number=100000)
      times[i] = t
      print("%d additions takes %f" % (i, t))

    keys = sorted(list(times.keys()))

    for i in range(len(keys) - 2):
      print("1 addition takes %f" % ((times[keys[i+1]] - times[keys[i]]) / (keys[i+1] - keys[i])))

Output:

16 additions takes 0.288647
32 additions takes 0.422229
64 additions takes 0.712617
128 additions takes 1.275438
256 additions takes 2.415222
512 additions takes 5.050155
1024 additions takes 10.381530
2048 additions takes 21.185604
4096 additions takes 43.122559
8192 additions takes 88.323853
16384 additions takes 194.353927
1  addition takes 0.008292
1 addition takes 0.010068
1 addition takes 0.008654
1 addition takes 0.010318
1 addition takes 0.008349
1 addition takes 0.009075
1 addition takes 0.008794
1 addition takes 0.008905
1 addition takes 0.010293
1 addition takes 0.010413
1 addition takes 0.010551
1 addition takes 0.010711
1 addition takes 0.011035

So according to this output one addition takes approximately 0.0095 usecs. Following this page instructions I calculated that one addition takes 25 CPU cycles. Is this a normal value and why? Because assembly instruction ADD only takes 1-2 CPU cycles.

Also, the addition isn't even *happening*, because it gets optimized out. — user2357112, Mar 31 '16 at 16:06
ok. but if i run more than 1000 additions it does'nt affect i guess — Arsen, Mar 31 '16 at 16:06
The for loop inside `test` certainly affects the timing regardless of the number of iterations. — Michael, Mar 31 '16 at 16:08
Your code for calculating time for one iteration of the loop ("1 addition") is wrong. It should be `times[keys[i]] / keys[i]`. Your calculation for how many cycles correspond to 0.0095 seconds is also wrong. A CPU running at 2 GHz executes 19,000,000 cycles in 0.0095 seconds. Python is slow, very slow. Its operations take orders of magnitude longer than the equivalent assembly instructions. — Ross Ridge, Mar 31 '16 at 17:44
@ross, oh, it's just a typo. I mean usec instead of sec.`((times[keys[i+1]] - times[keys[i]]) / (keys[i+1] - keys[i]))` this piece of code is not wrong. By dividing deltas I can adjust the accuracy of computation. — Arsen, Mar 31 '16 at 18:30
If you wanna measure clock cycles more accurately, you should use [hwcounter](https://github.com/paulsmith/hwcounter) instead, as it uses low level assembly to get the number of *reference* clock cycles, using `RDTSC` calls. — not2qubit, Nov 28 '22 at 03:35

cdarke · Accepted Answer · 2016-03-31T18:07:20.417

You are timing a function call (test()), a for loop, and a call to range(). The addition is not being timed at all.

def test(n):
    for i in range(n):
        1 + 1

import dis
dis.dis(test)

Here is the byte code for your test function (does not include the call to test()):

  2           0 SETUP_LOOP              24 (to 27)
              3 LOAD_GLOBAL              0 (range)
              6 LOAD_FAST                0 (n)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                10 (to 26)
             16 STORE_FAST               1 (i)

  3          19 LOAD_CONST               2 (2)   **** 
             22 POP_TOP             
             23 JUMP_ABSOLUTE           13
        >>   26 POP_BLOCK           
        >>   27 LOAD_CONST               0 (None)
             30 RETURN_VALUE

**** Notice, the addition is done at compilation time. Quite a few other languages and their compilers will do that, including C. However, the standards rarely define when a 1 + 1 is actually done, so it is often implementation dependant.

EDIT:

Your timeit function call could be this:

    t = timeit("x += 1", setup="x = 1", number=100000)

We can create a dummy function to check the operation:

def myfunc(x):
    x += 1

import dis
dis.dis(myfunc)

Making that change gives:

1 additions takes 0.008976
2 additions takes 0.007419
4 additions takes 0.007282
8 additions takes 0.007693
16 additions takes 0.007026
32 additions takes 0.007793
64 additions takes 0.010168
128 additions takes 0.008124
256 additions takes 0.009064
512 additions takes 0.007256
1 addition takes -0.001557
1 addition takes -0.000068
1 addition takes 0.000103
1 addition takes -0.000083
1 addition takes 0.000048
1 addition takes 0.000074
1 addition takes -0.000032
1 addition takes 0.000007

 26           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (1)
              6 INPLACE_ADD
              7 STORE_FAST               0 (x)
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE

Note that x += 1 is an INPLACE_ADD, different to x = x + 1 which is a BINARY_ADD. So you need to decide which OPCode you want to measure.

Thanks, now it became clear for me. But are there any ways to calculate addition's time more correctly? — Arsen, Mar 31 '16 at 16:20
What are you trying to achieve? When you call `+` in a high level Object Oriented language you are adding two objects together, and the `+` involves a function call. Some languages like Java and C++ use so called "primitives" for integers - they are not objects at all. In python you gain considerable flexibility. High density number crunching Python modules, like `numpy`, do their calculations in C extensions, not native Python, mainly for performance reasons. What is the purpose of your measurements? — cdarke, Mar 31 '16 at 16:26
You should also realise that Python does other optimisations. If you are writing this kind of low-level test you should check that the byte-code is actually doing what you expect. — cdarke, Mar 31 '16 at 16:32
See my EDIT for a suggestion, with all the caveats given in my comments. — cdarke, Mar 31 '16 at 16:47

score 6 · Answer 2 · answered Mar 31 '16 at 17:00

You can get a bit more insight into what is going on behind the scenes here by using the dis module.

Specifically, the dis.dis function takes a snippet of compiled Python code and returns the byte code that said snippet is interpreted as. In the case of 1 + 1:

In [1]: import dis

In [2]: def add1and1():
    return 1 + 1

In [3]: dis.dis(add1and1)
  2           0 LOAD_CONST               2 (2)
              3 RETURN_VALUE

So in this case, when the source code is compiled to byte code the operation 1 + 1 is only being executed a single time and then the result stored as a constant. We can get around this by returning the sum of parameters that are passed to the function:

In [1]: import dis

In [2]: def add(x, y):
    return x + y

In [3]: dis.dis(add)
      2           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                1 (y)
              6 BINARY_ADD          
              7 RETURN_VALUE

So the byte code instruction you are actually interested in is BINARY_ADD. If you want to find out more about it, you can find the relevant section in the CPython interpreter's ceval.c file (here):

TARGET(BINARY_ADD) {
    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *sum;
    if (PyUnicode_CheckExact(left) &&
             PyUnicode_CheckExact(right)) {
        sum = unicode_concatenate(left, right, f, next_instr);
        /* unicode_concatenate consumed the ref to v */
    }
    else {
        sum = PyNumber_Add(left, right);
        Py_DECREF(left);
    }
    Py_DECREF(right);
    SET_TOP(sum);
    if (sum == NULL)
        goto error;
    DISPATCH();
}

So there's more going on here than you might have originally expected. We have:

a conditional to determine whether we are using BINARY_ADD for string concatenation or for the adding numeric types
the actual call to PyNumber_Add where one might have expected something more along the lines of left + right

Both of these points are explained by Python's dynamic nature; since Python doesn't know the type of x or y until you actually call add, the type checking is done at run time instead of compile time. There are clever optimizations that can be made in dynamic languages to get around this (see V8 for JavaScript or PyPy for Python) but generally speaking this is the price you pay for the flexibility of an interpreted, dynamically typed language.

How many CPU cycles one addition take?

2 Answers2