8

I was going through data classes and named tuple. I found this behaviour where creating objects using different features of python have different performance.

dataclass:

In [1]: from dataclasses import dataclass
   ...:
   ...: @dataclass
   ...: class Position:
   ...:     lon: float = 0.0
   ...:     lat: float = 0.0
   ...:

In [2]: %timeit for _ in range(1000): Position(12.5, 345)
326 µs ± 34.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Normal class:

In [1]: class Position:
   ...:
   ...:     def __init__(self, lon=0.0, lat=0.0):
   ...:         self.lon = lon
   ...:         self.lat = lat
   ...:

In [2]: %timeit for _ in range(1000): Position(12.5, 345)
248 µs ± 2.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

namedtuple:

In [2]: Position = namedtuple("Position", ["lon","lat"], defaults=[0.0,0.0])

In [3]: %timeit for _ in range(1000): Position(12.5, 345)
286 µs ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  • Python version: 3.7.3
  • OS: MacOS Mojave

All implementations have same object attributes, same default values.

  1. Why is this trend of time(dataclass) > time(namedtuple) > time(normal class)?
  2. What does each implementation do to take their respective time?
  3. Which implementation is best performing in what scenario?

Here, time denotes time taken for creating objects.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
bigbounty
  • 16,526
  • 5
  • 37
  • 65

1 Answers1

3

In Python everything is a dict. In case of data class there are more entries in that dict, so in turn that takes more time to put them there.

How that change happened? @Arne's comment spotted that I'm missing something here. I did sample code:

from dataclasses import dataclass
import time

@dataclass
class Position:
    lon: float = 0.0
    lat: float = 0.0


start_time = time.time()
for i in range(100000):
    p = Position(lon=1.0, lat=1.0)
elapsed = time.time() - start_time
print(f"dataclass {elapsed}")
print(dir(p))


class Position2:
    lon: float = 0.0
    lat: float = 0.0

    def __init__(self, lon, lat):
        self.lon = lon
        self.lat = lat


start_time = time.time()
for i in range(100000):
    p = Position2(lon=1.0, lat=1.0)
elapsed = time.time() - start_time
print(f"just class {elapsed}")
print(dir(p))

start_time = time.time()
for i in range(100000):
    p = {"lon": 1.0, "lat": 1.0}
elapsed = time.time() - start_time
print(f"dict {elapsed}")

With results:

/usr/bin/python3.8 ...../test.py
dataclass 0.16358232498168945
['__annotations__', '__class__', '__dataclass_fields__', '__dataclass_params__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'lat', 'lon']
just class 0.1495649814605713
['__annotations__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'lat', 'lon']
dict 0.028212785720825195

Process finished with exit code 0

Dict example is for reference.

Looked into dataclass, this function:

(489) def _init_fn(fields, frozen, has_post_init, self_name, globals):

is responsible for creation of constructor. As Arne spotted - post_init code is optional, and not generated. I had other idea, that there is some work around fields, but:

In [5]: p = Position(lat = 1.1, lon=2.2)                                                                                                                                                                           

In [7]: p.lat.__class__                                                                                                                                                                                            
Out[7]: float

so there is no additional wraps / code here. From all of that the only additional stuff I saw - is that more methods.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Michał Zaborowski
  • 3,911
  • 2
  • 19
  • 39