control initialize order when Python dataclass inheriting a class

Question

What I kown
The Python dataclass allows inheritance, either with dataclass or class. In best practice (and also in other languages), when we do inheritance, the initialization should be called first. In Python it is:

def __init__(self):
    super().__init__()
    ...

What I'm doing
Since the dataclass was introduced in Python 3.7, I am considering replace all of my classes with the dataclass. With dataclass, one of its benefits is to generate __init__ for you. This is not good when the dataclass needs to inherit a base class -- for example:

class Base:
    def __init__(self):
        self.a = 1

@dataclass
class Child(Base):
    a:int
    def __post_init__(self):
        super().__init__()

My problem
The problem is we have to put super initialization call inside __post_init__ which in fact is called after dataclass's init.
The downside is that we lose the convention contract and the initialization disorder leads to that we can not override attributes of super classes.

It can be solved by concept of __pre_init__. I've read the document and does not see anything to do with that concept there. Am I missing something?

What do you mean with `we can not override attributes of super classes`? Do you want to have an attribute in the child class with the same name as in the parent class? Like, `class A: some_name: int` and `class B(A): some_name: int`? — Arne, Mar 11 '19 at 07:42
@Arne Yes, attribute in the same name. In Scala or Java, parent always init first over child. and child attributes can always override parent after init. — WeiChing 林煒清, Mar 12 '19 at 02:13
Ok, I think I understand. But I think that this kind of principle doesn't apply to the problem at hand. I used a few more words in [my answer](https://stackoverflow.com/a/55097166/962190). — Arne, Mar 12 '19 at 07:12

score 15 · Accepted Answer · edited Mar 31 '20 at 21:25

15

Actually there is one method which is called before __init__: it is __new__. So you can do such a trick: call Base.__init__ in Child.__new__. I can't say is it a good solution, but if you're interested, here is a working example:

class Base:
    def __init__(self, a=1):
        self.a = a


@dataclass
class Child(Base):
    a: int

    def __new__(cls, *args, **kwargs):
        obj = object.__new__(cls)
        Base.__init__(obj, *args, **kwargs)
        return obj


c = Child(a=3)
print(c.a)  # 3, not 1, because Child.__init__ overrides a

edited Mar 31 '20 at 21:25

TaylorMonacelli

350
2
3
9

answered Feb 28 '19 at 20:06

sanyassh

8,100
13
36
70

One drawback to this though is that you must clutter client code with default value ```c = child(a=1)``` because ```c = child()``` gives error ```TypeError: __init__() missing 1 required positional argument: 'a'```. With this then, you're loosing the benefit of setting the default in ```def __init__(self, a=1)```. – TaylorMonacelli Mar 31 '20 at 21:36

Arne · Answer 2 · 2019-03-11T13:34:30.490

In best practice [...], when we do inheritance, the initialization should be called first.

This is a reasonable best practice to follow, but in the particular case of dataclasses, it doesn't make any sense.

There are two reasons for calling a parent's constructor, 1) to instantiate arguments that are to be handled by the parent's constructor, and 2) to run any logic in the parent constructor that needs to happen before instantiation.

Dataclasses already handles the first one for us:

 @dataclass
class A:
    var_1: str

@dataclass
class B(A):
    var_2: str

print(B(var_1='a', var_2='b'))  # prints: B(var_1='a', var_2='b')
# 'var_a' got handled without us needing to do anything

And the second one does not apply to dataclasses. Other classes might do all kinds of strange things in their constructor, but dataclasses do exactly one thing: They assign the input arguments to their attributes. If they need to do anything else (that can't by handled by a __post_init__), you might be writing a class that shouldn't be a dataclass.

score 5 · Answer 3 · answered Feb 28 '19 at 14:37

5

how about:

from dataclasses import dataclass


class Base:
    def __init__(self, a=1):
        self.a = a


@dataclass
class Child(Base):

    def __post_init__(self):
        super().__init__()


ch = Child()

answered Feb 28 '19 at 14:37

naivepredictor

898
4
14

How is this different from what's in the question? – Patrick Haugh Feb 28 '19 at 14:38
run the question code and run my code and you will see that my version does not require you to provide the argument when you instantiate child. apart from it there is no much difference. – naivepredictor Feb 28 '19 at 14:45
1

I like this, but it prevents overriding attributes of super class...which is what @WeiChing is referring to in original question. IOW, ```ch = Child(a=3)``` and ```ch = Child(3)``` both fail. – TaylorMonacelli Mar 31 '20 at 20:46

Muhammad Yasirroni · Answer 4 · 2023-05-17T07:49:03.160

Using dataclass inherit from dataclass:

from dataclasses import dataclass

@dataclass
class Base:
    a: int = 1
    def __post_init__(self):
        self.b = self.a * 2

@dataclass
class Child(Base):
    def __post_init__(self):
        # super().__init__()  # this cause RecursionError
        super().__post_init__()  # without this, self.b is doesn't exist
        self.c = self.b * 5

ch = Child(a=3)
print(ch.a, ch.b, ch.c)  # Output: 3 6 30

Disclaimer: I'm still learning dataclass and can't find in the docs itself if this is recommended or not.

control initialize order when Python dataclass inheriting a class

4 Answers4