The problem (and what if we need more flexibility?)
The issue is that we don't have any way to tell mypy that items
will be Optional before __post_init__
but not afterward.
Carcigenicate's nice answer handles the case when the desired default initialization doesn't depend on other parameters of the initializer. However, let say that you need to look at name
in order to know how to default-initialize items
.
For this scenario, it would be great if there were an analog to the default_factory
method that took in the parameters to the partially initialized object as a parameter, but unfortunately there is no such analog. Other things that might look related but don't serve the purpose:
- The
init=False
field option which allows the field to be initialized in __post_init__
but removes the option of the user specifying an explicit value.
- Using the
InitVar
generic type does the opposite of what we want here: makes the value available to the initializer (and __post_init__
) without including it as a field of the dataclass object.
Using a non-None sentinel value
However, as a work-around, you can designate a special object value to represent to the __post_init__
method that the field's default value needs to be replaced. For most types, it is easy to just create a unique dummy object of the particular type which you can store as a class variable and return from the field default_factory (if it is a mutable type like list
, dataclass won't let you assign it directly as the default value). For types like str
and int
this won't be guaranteed to work as expected unless you use a "change_me" value that you know won't be a legitimate explicit value for the field.
from dataclasses import dataclass, field
from typing import ClassVar, List
@dataclass
class ClassWithState:
name: str
__uninitialized_items: ClassVar[List[str]] = list()
items: List[str] = field(default_factory=lambda: ClassWithState.__uninitialized_items)
def __post_init__(self) -> None:
if self.items is self.__uninitialized_items:
self.items = [str(i) for i in range(len(self.name))]
print(ClassWithState("testing", ["one", "two", "three"]))
print(ClassWithState("testing"))
print(ClassWithState("testing", []))
Output:
ClassWithState(name='testing', items=['one', 'two', 'three'])
ClassWithState(name='testing', items=['0', '1', '2', '3', '4', '5', '6'])
ClassWithState(name='testing', items=[])
If the field can have a slightly different name ...
Using properties
If you do not require passing explicit initialization by name (or even if you can simply let the parameter have a slightly different name from the name use you when asserting non-None), then properties are an even more flexible option.
The idea is to have the Optional field be a separate (possibly even a "private") member while having a property give access to a version that is automatically cast. I came across this solution for a situation where I needed to apply additional transformations whenever the object was accessed and casting is just a special case (the ability to have the property be read-only is nice as well). (You can consider cached_property
if the object reference will never change.)
Here's an example:
from dataclasses import dataclass
from typing import List, Optional, cast
@dataclass
class ClassWithState:
name: str
_items: Optional[List[str]] = None
@property
def items(self) -> List[str]:
return cast(List[str], self._items)
@items.setter
def items(self, value: List[str]) -> None:
self._items = value
def __post_init__(self) -> None:
if self._items is None:
self._items = [str(i) for i in range(len(self.name))]
print(ClassWithState("testing", _items=["one", "two", "three"]))
print(ClassWithState("testing", ["one", "two", "three"]))
print(ClassWithState("testing", []))
print(ClassWithState("testing"))
obj = ClassWithState("testing")
print(obj)
obj.items.append('test')
print(obj)
obj.items = ['another', 'one']
print(obj)
print(obj.items)
And the output:
ClassWithState(name='testing', _items=['one', 'two', 'three'])
ClassWithState(name='testing', _items=['one', 'two', 'three'])
ClassWithState(name='testing', _items=[])
ClassWithState(name='testing', _items=['0', '1', '2', '3', '4', '5', '6'])
ClassWithState(name='testing', _items=['0', '1', '2', '3', '4', '5', '6'])
ClassWithState(name='testing', _items=['0', '1', '2', '3', '4', '5', '6', 'test'])
ClassWithState(name='testing', _items=['another', 'one'])
['another', 'one']
Make an InitVar[Optional[...]]
field and use __post_init__
to set the true field
Another alternative if you can handle a different name is to use InitVar
to specify that the Optional version is just a parameter to __init__
(and __post_init__
) and then to set a different, non-optional, member variable within __post_init__
. This avoids needing to do any casting, doesn't require setting up a property, allows the representation to use the target name rather than the surrogate name, and doesn't risk the problem of not having a reasonable sentinel value, but, again, it only works if you can handle an initializer parameter with a different name from the access field and it is less flexible than the property approach:
from dataclasses import InitVar, dataclass, field
from typing import List, Optional
@dataclass
class ClassWithState:
name: str
_items: InitVar[Optional[List[str]]] = None
items: List[str] = field(init=False, default_factory=list)
def __post_init__(self, items: Optional[List[str]]) -> None:
if items is None:
items = [str(i) for i in range(len(self.name))]
self.items = items
The usage is the same as the property approach, and the output would also look the same except that the representation wouldn't have the underscore in front of items
.