What explains the different performance of creating objects via normal class, dataclass and namedtuple?

Question

I was going through data classes and named tuple. I found this behaviour where creating objects using different features of python have different performance.

dataclass:

In [1]: from dataclasses import dataclass
   ...:
   ...: @dataclass
   ...: class Position:
   ...:     lon: float = 0.0
   ...:     lat: float = 0.0
   ...:

In [2]: %timeit for _ in range(1000): Position(12.5, 345)
326 µs ± 34.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Normal class:

In [1]: class Position:
   ...:
   ...:     def __init__(self, lon=0.0, lat=0.0):
   ...:         self.lon = lon
   ...:         self.lat = lat
   ...:

In [2]: %timeit for _ in range(1000): Position(12.5, 345)
248 µs ± 2.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

namedtuple:

In [2]: Position = namedtuple("Position", ["lon","lat"], defaults=[0.0,0.0])

In [3]: %timeit for _ in range(1000): Position(12.5, 345)
286 µs ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Python version: 3.7.3
OS: MacOS Mojave

All implementations have same object attributes, same default values.

Why is this trend of time(dataclass) > time(namedtuple) > time(normal class)?
What does each implementation do to take their respective time?
Which implementation is best performing in what scenario?

Here, time denotes time taken for creating objects.

score 3 · Answer 1 · edited Feb 09 '22 at 15:01

In Python everything is a dict. In case of data class there are more entries in that dict, so in turn that takes more time to put them there.

How that change happened? @Arne's comment spotted that I'm missing something here. I did sample code:

from dataclasses import dataclass
import time

@dataclass
class Position:
    lon: float = 0.0
    lat: float = 0.0


start_time = time.time()
for i in range(100000):
    p = Position(lon=1.0, lat=1.0)
elapsed = time.time() - start_time
print(f"dataclass {elapsed}")
print(dir(p))


class Position2:
    lon: float = 0.0
    lat: float = 0.0

    def __init__(self, lon, lat):
        self.lon = lon
        self.lat = lat


start_time = time.time()
for i in range(100000):
    p = Position2(lon=1.0, lat=1.0)
elapsed = time.time() - start_time
print(f"just class {elapsed}")
print(dir(p))

start_time = time.time()
for i in range(100000):
    p = {"lon": 1.0, "lat": 1.0}
elapsed = time.time() - start_time
print(f"dict {elapsed}")

With results:

/usr/bin/python3.8 ...../test.py
dataclass 0.16358232498168945
['__annotations__', '__class__', '__dataclass_fields__', '__dataclass_params__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'lat', 'lon']
just class 0.1495649814605713
['__annotations__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'lat', 'lon']
dict 0.028212785720825195

Process finished with exit code 0

Dict example is for reference.

Looked into dataclass, this function:

(489) def _init_fn(fields, frozen, has_post_init, self_name, globals):

is responsible for creation of constructor. As Arne spotted - post_init code is optional, and not generated. I had other idea, that there is some work around fields, but:

In [5]: p = Position(lat = 1.1, lon=2.2)                                                                                                                                                                           

In [7]: p.lat.__class__                                                                                                                                                                                            
Out[7]: float

so there is no additional wraps / code here. From all of that the only additional stuff I saw - is that more methods.

Thank you. I will go through it. I have edited the question. — bigbounty, Jul 15 '20 at 12:17
@bigbounty I would add simple dictionary as well - if we talk about performance ;) — Michał Zaborowski, Jul 15 '20 at 12:54
If you do not write a `__post_init__` yourself, the call to it won't be generated. — Arne, Jul 16 '20 at 14:28
@Arne you are right - I've updated response. Hope now it is OK :) — Michał Zaborowski, Jul 17 '20 at 12:12
It would be interesting to re-run your tests with dataclasses that take advantage of "slots=True" — wakey, Jun 08 '23 at 05:39

What explains the different performance of creating objects via normal class, dataclass and namedtuple?

1 Answers1