1

I'm using Python dataclasses with inheritance and I would like to make an inherited abstract property into a required constructor argument. Using an inherited abstract property as a optional constructor argument works as expected, but I've been having real trouble making the argument required.

Below is a minimal working example, test_1() fails with TypeError: Can't instantiate abstract class Child1 with abstract methods inherited_attribute, test_2() fails with AttributeError: can't set attribute, and test_3() works as promised.

Does anyone know a way I can achieve this behavior while still using dataclasses?

import abc
import dataclasses

@dataclasses.dataclass
class Parent(abc.ABC):

    @property
    @abc.abstractmethod
    def inherited_attribute(self) -> int:
        pass

@dataclasses.dataclass
class Child1(Parent):
    inherited_attribute: int

@dataclasses.dataclass
class Child2(Parent):
    inherited_attribute: int = dataclasses.field()

@dataclasses.dataclass
class Child3(Parent):
    inherited_attribute: int = None

def test_1():
    Child1(42)

def test_2():
    Child2(42)

def test_3():
    Child3(42)

Roy Smart
  • 664
  • 4
  • 12

3 Answers3

2

So, the thing is, you declared an abstract property. Not an abstract constructor argument, or an abstract instance dict entry - abc has no way to specify such things.

Abstract properties are really supposed to be overridden by concrete properties, but the abc machinery will consider it overridden if there is a non-abstract entry in the subclass's class dict.

  • Your Child1 doesn't create a class dict entry for inherited_attribute - the annotation only creates an entry in the annotation dict.
  • Child2 does create an entry in the class dict, but then the dataclass machinery removes it, because it's a field with no default value. This changes the abstractness status of Child2, which is undefined behavior below Python 3.10, but Python 3.10 added abc.update_abstractmethods to support that, and dataclasses uses that function on Python 3.10.
  • Child3 creates an entry in the class dict, and since the dataclass machinery sees this entry as a default value, it leaves the entry there, so the abstract property is considered overridden.

So you've got a few courses of action here. The first is to remove the abstract property. You don't want to force your subclasses to have a property - you want your subclasses to have an accessible inherited_attribute instance attribute, and it's totally fine if this attribute is implemented as an instance dict entry. abc doesn't support that, and using an abstract property is wrong, so just document the requirement instead of trying to use abc to enforce it.

With the abstract property removed, Parent isn't actually abstract any more, and in fact doesn't really do anything, so at that point, you can just take Parent out entirely.


Option 2, if you really want to stick with the abstract property, would be to give your subclasses a concrete property, properly overriding the abstract property:

@dataclasses.dataclass
class Child(Parent):
    _hidden_field: int
    @property
    def inherited_attribute(self):
        return self._hidden_field

This would require you to give the field a different name from the attribute name you wanted, with consequences for the constructor argument names, the repr output, and anything else that cares about field names.


The third option is to get something else into the class dict to shadow the inherited_attribute name, in a way that doesn't get treated as a default value. Python 3.10 added slots support in dataclasses, so you could do

@dataclasses.dataclass(slots=True)
class Child(Parent):
    inherited_attribute: int

and the generated slot descriptor would shadow the abstract property, without being treated as a default value. However, this would not give the usual memory savings of slots, because your classes inherit from Parent, which doesn't use slots.


Overall, I would recommend option 1. Abstract properties don't mean what you want, so just don't use them.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Thanks for your very informative answer! Regarding your recommendation to use Option 1: what if I have concrete methods in the `Parent` class that use `inherited_attribute`? Don't I need to define something so that using an abstract class makes sense? – Roy Smart Oct 26 '22 at 01:35
  • @RoySmart: As far as the Python interpreter itself cares, no. You can just write a bunch of methods in `Parent` that use the attribute, and as long as the instance actually has that attribute at runtime, it'll work fine. If you're worried about static type checkers, you can [annotate `self` with a protocol](https://mypy.readthedocs.io/en/latest/more_types.html#mixin-classes) in `Parent`. (That link talks about mixins, but it should work whether or not your class is intended as a mixin.) – user2357112 Oct 26 '22 at 05:50
2

Answering my own question since I just found another option than those listed in @user2357112's excellent answer.

What seems to work is setting the default value of the field to dataclasses.MISSING like in the following example:

@dataclasses.dataclass
class Child4(Parent):
    inherited_attribute: int = dataclasses.MISSING

This might be better than @user2357112's Option 3 since it actually raises a TypeError: Child4.__init__() missing 1 required positional argument: 'inherited_attribute' if the value of inherited_attribute is missing, instead of silently setting it to the property Parent.inherited_attribute.

This is probably more of a hack than a real solution since the documentation of dataclasses.field() says that "No code should directly use the MISSING value."

Roy Smart
  • 664
  • 4
  • 12
2

TLDR version: Use MRO and a stub class to set sensible default class attribute that guarantees abstractness is removed for the desired properties. Use dataclasses.MISSING as suggested by Roy, so no default values are actually created by dataclass.

Original post example (adapted):

import abc
import dataclasses
from typing import cast

@dataclasses.dataclass
class Parent(abc.ABC):

    @property
    @abc.abstractmethod
    def inherited_attribute(self) -> int:
        pass

class _ParentImpl:
    # this hides the abstract method in the final class 
    # so it cannot "zombie" back if there is no default value for field
    inherited_attribute: int = cast(int, dataclasses.MISSING)

@dataclasses.dataclass
class Child1(_ParentImpl, Parent):
    inherited_attribute: int

@dataclasses.dataclass
class Child2(_ParentImpl, Parent):
    inherited_attribute: int = dataclasses.field()

@dataclasses.dataclass
class Child3(_ParentImpl, Parent):
    inherited_attribute: int = None

def test_1():
    Child1(42)

def test_2():
    Child2(42)

def test_3():
    Child3(42)

Keep reading to understand details if you wish...

It took me some digging, but I found out why Roy's answer with dataclasses.MISSING seems to work. It has to do with the fact that dataclasses store field default values (not default_factory) in the class attributes. As an illustrative example:

@dataclass
class MyClass:
    foo: float
    bar: float = field(default=0.0)
    baz: float = 0.0

In the final class (after dataclass does its machinery), both bar and baz will be class attributes with value 0.0. (The field metadata disappears into __dataclass_fields__). The field foo does not show up as a class attribute, since its only an annotation (and not in __dict__).

Now, consider this modified example

@dataclass
class MyBaseClass:
    foo: float = field()
    bar: float = field(default=1.0)


@dataclass
class MyClass(MyBaseClass):
    baz: float = 0.0

Now, foo is not just an annotation in the base class, but is an actual class attribute (as written). But, since there is no default, dataclass actually deletes this attribute, since it is not useful to any derived classes (it does not hold a default value). The key to Roy's answer is that the missing default is delineated by the special value dataclasses.MISSING.

In contrast, bar holds an actual field default. This is transferred to the class attribute as MyBaseClass.bar = 1.0.

So, we have the following rules:

  1. If a field is specified by annotation only, there is no resulting class attribute (in that concrete class)
  2. If a field is specified with annotation and direct default value (non-field object), then this value remains as the final value of the class attribute. This is true even if the value is dataclasses.MISSING.
  3. If a field is specified as a dataclasses.field object, then if it has no default (default=dataclasses.MISSING), the class attribute is deleted. Otherwise, the default value is transferred to the class attribute during the dataclass construction process.

Now, we must examine the interaction of this mechanism with abstract properties. Given

class MyBaseClass(ABC):
    @property
    @abstractmethod
    def foo(self) -> float:
        ...

@dataclass
class MyClass(MyBaseClass):
    foo: float

What happens in this particular case is that the dataclass parser looks at the base class and finds the property MyBaseClass.foo. It then interprets this as the default value of dataclass field foo. As you might imagine, this will not end well. So, we have to prevent this from happening by providing a new default value. Using Roy's solution

@dataclass
class MyClass(MyBaseClass):
    foo: float = dataclasses.MISSING

Now, there is a new default for foo, so MyBaseClass.foo is not read. Furthermore, when abc.update_abstractmethods is called at the end of the dataclass process, it will not detect that foo is an abstract method class attribute, so it will be properly removed from the abstract method list. However, due to dataclasses.MISSING being a special sentinal value, no actual defaults for foo are added to the __init__ method, or anywhere else.

However, if we do instead:

@dataclass
class MyClass(MyBaseClass):
    foo: float = field()
    

The default value of foo is still dataclasses.MISSING, but this is not transferred to the final class attribute according to rule 3 above. Rather, the attribute is deleted (on the concrete class) and the abstract property persists as the exposed foo according to the MRO.

Now, why don't we use the former (Roy's) solution rather than the latter? We could imagine we need a field object to specify something like default_factory, repr behavior, etc. The bare default value solution is rather limiting. So, here is a workaround that lets you use field() in this construct.

class MyBaseClass(ABC):
    @property
    @abstractmethod
    def foo(self) -> float:
        ...

@dataclass
class MyClass(MyBaseClass):
    foo: float = field(repr=False, default_factory=lambda: 1.0)  # contrived need for field()
    
# replace the MISSING default value and fix registration of abstractness of foo
MyClass.foo = dataclasses.MISSING
abc.update_abstractmethods(MyClass)

Result:

x = MyClass(foo=2.0)
print(x.foo)
2.0
y = MyClass()
print(y.foo)
1.0

This is rather ugly to make use of as a general practice, so I recommend to wrap up in some kind of decorator. I rolled my own pre and post decorators that fix annotation-only fields and applied the dataclasses.MISSING resolution after dataclass creation.

Note that this is only an issues if you do not provide a default value for the field (default_factory does not count).

Edit: you can also abuse MRO to fix this by creating a trivial base class which lists the fields to be used as overrides of the abstract property as a class attribute equal to dataclasses.MISSING. This class should be listed first in the MRO before any abstract classes so that the "default" is resolved correctly. This is perhaps more straightforward because you can list all fields you are overriding rather than having to pay attention to which have defaults. (Any actual defaults will be overridden in concrete dataclass).

Here is a working example to demonstrate this (with appropriate typing cast fixes to pass mypy and other type checking). Note it works well with multiple levels of inheritance. Note also that stub classes are not processed with @dataclass.

import dataclasses
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from typing import cast


class MyBaseClass(ABC):
    @property
    @abstractmethod
    def foo(self) -> float:
        ...

    @property
    @abstractmethod
    def bar(self) -> float:
        ...

class _MyClass:
    foo: float = cast(float, dataclasses.MISSING)


@dataclass
class MyClass(_MyClass, MyBaseClass, ABC):
    foo: float = field(hash=False)


class _MyChildClass:
    bar: float = cast(float, dataclasses.MISSING)


@dataclass
class MyChildClass(_MyChildClass, MyClass):
    bar: float


if __name__ == '__main__':
    x = MyChildClass(1.0, 2.0)
    print(x)
bdavis
  • 91
  • 3
  • Wow, very interesting analysis! Good job getting `dataclasses.field()` to work, that will be really useful. – Roy Smart Aug 23 '23 at 03:37