2

I was playing around with sum function and observed the following behaviour.

case 1:

source = """
class A:
    def __init__(self, a):
        self.a = a
    
    def __add__(self, other):
        return self.a + other;

sum([*range(10000)], start=A(10))
"""

import timeit
print(timeit.timeit(stmt=source))

As you can see I am using an instance of custom class as start argument to the sum function. Benchmarking above code takes around 192.60747704200003 seconds in my system.

case 2:

source = """
class A:
    def __init__(self, a):
        self.a = a
    
    def __add__(self, other):
        return self.a + other;

sum([*range(10000)], start=10).  <- Here
"""

import timeit
print(timeit.timeit(stmt=source))

But if I remove the custom class instance and use int object directly it tooks only 111.48285191600007 seconds. I am curious to understand the reason for this speed difference?.

My system info:

>>> import platform
>>> platform.platform()
'macOS-12.5-arm64-arm-64bit'
>>> import sys
>>> sys.version
'3.11.0 (v3.11.0:deaf509e8f, Oct 24 2022, 14:43:23) [Clang 13.0.0 (clang-1300.0.29.30)]'
Abdul Niyas P M
  • 18,035
  • 2
  • 25
  • 46
  • Adding two ints can be done in a single CPU instruction. Calling a method to add two integers together requires more computation – byxor Nov 18 '22 at 12:19
  • @byxor But `A(10) + 0` returns an `int` and not an instance of `A` (`return self.a + other`). So at first glance the method is just called once, and that shouldn't make such a difference? – Timus Nov 18 '22 at 12:23
  • 3
    @byxor Adding two ints in Python is much more than one CPU instruction, Python is not C. Anyway, the `__add__` method gets called only once, for the first addition, then we only add integers and are in the same situation as the first example. The main difference comes from having to create and initialize the `A(10)` instance, which one can see by using a `range(0)`. Using a `range(1)` will add the time needed for the first addition, which is the second largest reason for the second code being slower. – Thierry Lathuille Nov 18 '22 at 12:24
  • Yes, the only point I'm making is that case 2 doesn't require as much computation (and generally runs faster), regardless of the python implementation – byxor Nov 18 '22 at 12:29
  • @ThierryLathuille I don't think that creating an initializing one A is the main difference. The main reason is that the custom type for `start` causes the builtin sum to miss a CPython fast-path optimization. – wim Nov 18 '22 at 20:35

2 Answers2

5

builtin_sum_impl has 2 implementations inside, one if the start is a number which skips creating python "number objects" and just sums numbers in C.

the other slower implementation when start is not a number, which forces the __add__ method of "number objects" to be called, (because it assumes you are summing some weird classes).

you forced it to use the slower one.

wim
  • 338,267
  • 99
  • 616
  • 750
Ahmed AEK
  • 8,584
  • 2
  • 7
  • 23
-1

Maybe looking at the byte-code can help understand what happens. If you run

import dis

def test_range():
    class A:
        def __init__(self, a):
            self.a = a

        def __add__(self, other):
            return self.a + other

    sum([*range(10000)], start=10)

dis.dis(test_range)

the version with start=A(10) generates 2 more instructions:

2 LOAD_CONST               1 (<code object A at 0x7ff0bfa25c90, file "/.../main.py", line 5>)
...
26 LOAD_CONST               4 (10)
28 LOAD_CONST               5 (('start',))
30 CALL_FUNCTION_KW         2
32 POP_TOP
34 LOAD_CONST               0 (None)
36 RETURN_VALUE

vs

2 LOAD_CONST               1 (<code object A at 0x7ff0bfa25c90, file "/.../main.py", line 5>)
...
26 LOAD_FAST                0 (A)       <--- here
28 LOAD_CONST               4 (10)
30 CALL_FUNCTION            1           <--- and here
32 LOAD_CONST               5 (('start',))
34 CALL_FUNCTION_KW         2
36 POP_TOP
38 LOAD_CONST               0 (None)
40 RETURN_VALUE

Complete byte-code for version with start=A(10) is here.

My (limited) understanding is that those 2 lines point to the initialization of A. Please, someone confirm.

payloc91
  • 3,724
  • 1
  • 17
  • 45
  • 1
    This is not the reason. A couple of ops in the disassembly is not going to make a meaningful difference. All of the calculation happens inside that one CALL_FUNCTION_KW op, which is referring to the builtin [sum](https://docs.python.org/3/library/functions.html#sum). – wim Nov 18 '22 at 20:27