5

Consider a dataclass with a mutable default value for an argument. To be able to instantiate an object with a new default value and not a shared mutable object, we can do something like:

@dataclass
class ClassWithState:
    name: str
    items: Optional[List[str]] = None

    def __post_init__(self) -> None:
        if self.items is None:
            self.items = []

This works as expected. However, whenever I refer to items in some instance of this class, mypy warns that items may be None. For example:

c = ClassWithState("object name")
c.items.append("item1")

MyPy will complain with something like:

Item "None" of "Optional[List[str]]" has no attribute "append".

I don't want to have to add unnecissary checks every time I refer to items such as

assert c.items is not None

everywhere I refer to items. How can I convince mypy that items will never be None?

Metropolis
  • 2,018
  • 1
  • 19
  • 36

2 Answers2

4

I'd use field with the default_factory option set:

from dataclasses import dataclass, field
from typing import List


@dataclass
class ClassWithState:
    name: str
    items: List[str] = field(default_factory=list)

>>> ClassWithState("Hello")
ClassWithState(name='Hello', items=[])
Carcigenicate
  • 43,494
  • 9
  • 68
  • 117
  • doesn't this model set the default to an empty list, not None? What if you want a default to be None, rather than an empty list? I'm curious how to handle this, because the `Optional` defaulting to None - by typing's view of the world - seems to be 'standard', but it does cause mypy problems because `None` doesn't match your declared type. – Richard Nov 10 '20 at 17:42
  • 1
    @Richard Yes. I'm "reading between the lines" in this answer, and assumed that the OP was using `None` as a placeholder, and that they actually wanted something else (which appears to have been the case). You're saying you want the field to be of type `Optional[Something]`, but don't want type errors when you try to use the field in a non-optional context? – Carcigenicate Nov 10 '20 at 17:46
  • Mostly it's because of convenience and brevity. Convenience because you can check if a field is 'set' by simple `if ClassWithState.items: do_something` and not have it through an exception if the field isn't defined (Much shorter then wrapping in a `hasattr()`). Brevity because a lot of what I use them for results in me dumping the structures into json. The json encoder I'm using will skip anything with `None`, but for empty lists etc it would dump that as an empty list. – Richard Nov 11 '20 at 18:26
  • @Richard You should be able to avoid mypy errors using `if your_optional` or `if your_optional is not None:` before use. Something like `return your _optional and your_optional.value_to_use` should work too due to how `and` works. Really, if you're using Optional, you should do a check for safety. In langusges like Haskell where this pattern originated, it's actually a compile-time error to not handle `None` `Optional` case. – Carcigenicate Nov 11 '20 at 18:39
  • 1
    Yes, that's what I do. I just would prefer not to, as while it makes for typesafe, mypy passing code, it does add an indent and another line to everywhere I want to use it ...however it is the correct thing to do and what I've been moving to. – Richard Nov 16 '20 at 21:30
3

The problem (and what if we need more flexibility?)

The issue is that we don't have any way to tell mypy that items will be Optional before __post_init__ but not afterward.

Carcigenicate's nice answer handles the case when the desired default initialization doesn't depend on other parameters of the initializer. However, let say that you need to look at name in order to know how to default-initialize items.

For this scenario, it would be great if there were an analog to the default_factory method that took in the parameters to the partially initialized object as a parameter, but unfortunately there is no such analog. Other things that might look related but don't serve the purpose:

  • The init=False field option which allows the field to be initialized in __post_init__ but removes the option of the user specifying an explicit value.
  • Using the InitVar generic type does the opposite of what we want here: makes the value available to the initializer (and __post_init__) without including it as a field of the dataclass object.

Using a non-None sentinel value

However, as a work-around, you can designate a special object value to represent to the __post_init__ method that the field's default value needs to be replaced. For most types, it is easy to just create a unique dummy object of the particular type which you can store as a class variable and return from the field default_factory (if it is a mutable type like list, dataclass won't let you assign it directly as the default value). For types like str and int this won't be guaranteed to work as expected unless you use a "change_me" value that you know won't be a legitimate explicit value for the field.

from dataclasses import dataclass, field
from typing import ClassVar, List


@dataclass
class ClassWithState:
    name: str
    __uninitialized_items: ClassVar[List[str]] = list()
    items: List[str] = field(default_factory=lambda: ClassWithState.__uninitialized_items)

    def __post_init__(self) -> None:
        if self.items is self.__uninitialized_items:
            self.items = [str(i) for i in range(len(self.name))]


print(ClassWithState("testing", ["one", "two", "three"]))
print(ClassWithState("testing"))
print(ClassWithState("testing", []))

Output:

ClassWithState(name='testing', items=['one', 'two', 'three'])
ClassWithState(name='testing', items=['0', '1', '2', '3', '4', '5', '6'])
ClassWithState(name='testing', items=[])

If the field can have a slightly different name ...

Using properties

If you do not require passing explicit initialization by name (or even if you can simply let the parameter have a slightly different name from the name use you when asserting non-None), then properties are an even more flexible option. The idea is to have the Optional field be a separate (possibly even a "private") member while having a property give access to a version that is automatically cast. I came across this solution for a situation where I needed to apply additional transformations whenever the object was accessed and casting is just a special case (the ability to have the property be read-only is nice as well). (You can consider cached_property if the object reference will never change.)

Here's an example:

from dataclasses import dataclass
from typing import List, Optional, cast


@dataclass
class ClassWithState:
    name: str
    _items: Optional[List[str]] = None

    @property
    def items(self) -> List[str]:
        return cast(List[str], self._items)

    @items.setter
    def items(self, value: List[str]) -> None:
        self._items = value

    def __post_init__(self) -> None:
        if self._items is None:
            self._items = [str(i) for i in range(len(self.name))]


print(ClassWithState("testing", _items=["one", "two", "three"]))
print(ClassWithState("testing", ["one", "two", "three"]))
print(ClassWithState("testing", []))
print(ClassWithState("testing"))

obj = ClassWithState("testing")
print(obj)
obj.items.append('test')
print(obj)
obj.items = ['another', 'one']
print(obj)
print(obj.items)

And the output:

ClassWithState(name='testing', _items=['one', 'two', 'three'])
ClassWithState(name='testing', _items=['one', 'two', 'three'])
ClassWithState(name='testing', _items=[])
ClassWithState(name='testing', _items=['0', '1', '2', '3', '4', '5', '6'])
ClassWithState(name='testing', _items=['0', '1', '2', '3', '4', '5', '6'])
ClassWithState(name='testing', _items=['0', '1', '2', '3', '4', '5', '6', 'test'])
ClassWithState(name='testing', _items=['another', 'one'])
['another', 'one']

Make an InitVar[Optional[...]] field and use __post_init__ to set the true field

Another alternative if you can handle a different name is to use InitVar to specify that the Optional version is just a parameter to __init__ (and __post_init__) and then to set a different, non-optional, member variable within __post_init__. This avoids needing to do any casting, doesn't require setting up a property, allows the representation to use the target name rather than the surrogate name, and doesn't risk the problem of not having a reasonable sentinel value, but, again, it only works if you can handle an initializer parameter with a different name from the access field and it is less flexible than the property approach:

from dataclasses import InitVar, dataclass, field
from typing import List, Optional


@dataclass
class ClassWithState:
    name: str
    _items: InitVar[Optional[List[str]]] = None
    items: List[str] = field(init=False, default_factory=list)

    def __post_init__(self, items: Optional[List[str]]) -> None:
        if items is None:
            items = [str(i) for i in range(len(self.name))]
        self.items = items

The usage is the same as the property approach, and the output would also look the same except that the representation wouldn't have the underscore in front of items.

teichert
  • 3,963
  • 1
  • 31
  • 37