8

I am defining a class about as follows:

from numbers import Number
from typing import Dict

from typeguard import typechecked

Data = Dict[str, Number]

@typechecked
class Foo:
    def __init__(self, data: Data):
        self._data = dict(data)
    @property
    def data(self) -> Data:
        return self._data

I am using typeguard. My intention is to restrict the types that can go into the data dictionary. Obviously, typeguard does check the entire dictionary if it is passed into a function or returned from one. If the dictionary is "exposed" directly, it becomes the dictionary's "responsibility" to check types - which does not work, obviously:

bar = Foo({'x': 2, 'y': 3}) # ok

bar = Foo({'x': 2, 'y': 3, 'z': 'not allowed'}) # error as expected

bar.data['z'] = 'should also be not allowed but still is ...' # no error, but should cause one

PEP 589 introduces typed dictionaries, but for a fixed set of keys (similar to struct-like constructs in other languages). In contrast, I need this for a flexible number of arbitrary keys.

My best bad idea is to go "old-school": Sub-classing dict and re-implementing every bit of API through which data can go in (and out) of the dictionary and adding type checks to them:

@typechecked
class TypedDict(dict): # just a sketch
    def __init__(
        self,
        other: Union[Data, None] = None,
        **kwargs: Number,
    ):
        pass # TODO
    def __setitem__(self, key: str, value: Number):
        pass # TODO
    # TODO

Is there a valid alternative that does not require the "old-school" approach?

s-m-e
  • 3,433
  • 2
  • 34
  • 71
  • "My best bad idea is to go "old-school": Sub-classing dict and re-implementing every bit of API through which data can go in (and out) of the dictionary and adding type checks to them" Don't do that. Use something like [collections.UserDict](https://docs.python.org/3/library/collections.html#collections.UserDict) – juanpa.arrivillaga Oct 14 '21 at 08:36
  • In any case, I don't see any straightforward way for TypeGuard to accomplish this. Your `dict` is a regular Python `dict`, nothing you can do about it other than to provide y our own custom dict. Note, *static* analysis would catch this error, e.g., with `mypy`. EDIT: PEP 589 is irrelevant here. `TypeDict`'s are **simply for type hinting**, they return *regular dict's*. – juanpa.arrivillaga Oct 14 '21 at 08:37
  • 1
    You aren't actually asking about static type hinting, you are interested in runtime validation, it seems – juanpa.arrivillaga Oct 14 '21 at 08:40
  • @juanpa.arrivillaga Yep, this is runtime checking. I adjusted the title. Hope it fits. Sub-classing `UserDict` instead of `dict`: Point taken, thanks. I am aware that PEP 589 is irrelevant, but I did not even know it existed before researching this question :) So I thought I might have overlooked something similar, closer to my needs. – s-m-e Oct 14 '21 at 09:10

2 Answers2

6

There seem to be several parts to your question.


(1) Creating a type-checked dictionary at runtime


As @juanpa.arrivillaga says in the comments, this has everything to do with type-checking, but doesn't seem to have anything to do with type-hinting. However, it's fairly trivial to design your own custom type-checked data structure. You can do it like this using collections.UserDict:

from collections import UserDict
from numbers import Number

class StrNumberDict(UserDict):
    def __setitem__(self, key, value):
        if not isinstance(key, str):
            raise TypeError(
                f'Invalid type for dictionary key: '
                f'expected "str", got "{type(key).__name__}"'
            )
        if not isinstance(value, Number):
            raise TypeError(
                f'Invalid type for dictionary value: '
                f'expected "Number", got "{type(value).__name__}"'
            )
        super().__setitem__(key, value)

In usage:

>>> d = StrNumberDict()
>>> d['foo'] = 5
>>> d[5] = 6
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<string>", line 5, in __setitem__
TypeError: Invalid type for dictionary key: expected "str", got "int"
>>> d['bar'] = 'foo'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<string>", line 10, in __setitem__
TypeError: Invalid type for dictionary value: expected "Number", got "str"

If you wanted to generalise this kind of thing, you could do it like this:

from collections import UserDict

class TypeCheckedDict(UserDict):
    def __init__(self, key_type, value_type, initdict=None):
        self._key_type = key_type
        self._value_type = value_type
        super().__init__(initdict)

    def __setitem__(self, key, value):
        if not isinstance(key, self._key_type):
            raise TypeError(
                f'Invalid type for dictionary key: '
                f'expected "{self._key_type.__name__}", '
                f'got "{type(key).__name__}"'
            )
        if not isinstance(value, self._value_type):
            raise TypeError(
                f'Invalid type for dictionary value: '
                f'expected "{self._value_type.__name__}", '
                f'got "{type(value).__name__}"'
            )
        super().__setitem__(key, value)

In usage:

>>> from numbers import Number
>>> d = TypeCheckedDict(key_type=str, value_type=Number, initdict={'baz': 3.14})
>>> d['baz']
3.14
>>> d[5] = 5
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<string>", line 9, in __setitem__
TypeError: Invalid type for dictionary key: expected "str", got "int"
>>> d['foo'] = 'bar'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<string>", line 15, in __setitem__
TypeError: Invalid type for dictionary value: expected "Number", got "str"
>>> d['foo'] = 5
>>> d['foo']
5

Note that you don't need to do type checks for the dictionary you pass to super().__init__(). UserDict.__init__ calls self.__setitem__, which you've already overridden, so if you pass an invalid dictionary to TypeCheckedDict.__init__, you'll find an exception is raised in just the same way as if you try to add an invalid key or value to the dictionary after it has been constructed:

>>> from numbers import Number
>>> d = TypeCheckedDict(str, Number, {'foo': 'bar'})
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<string>", line 5, in __init__
  line 985, in __init__
    self.update(dict)
  line 842, in update
    self[key] = other[key]
  File "<string>", line 16, in __setitem__
TypeError: Invalid type for dictionary value: expected "Number", got "str"

UserDict is specifically designed for easy subclassing in this way, which is why it is a better base class in this instance than dict.

If you wanted to add type hints to TypeCheckedDict, you'd do it like this:

from collections import UserDict
from collections.abc import Mapping, Hashable
from typing import TypeVar, Optional

K = TypeVar('K', bound=Hashable)
V = TypeVar('V')

class TypeCheckedDict(UserDict[K, V]):
    def __init__(
        self, 
        key_type: type[K], 
        value_type: type[V], 
        initdict: Optional[Mapping[K, V]] = None
    ) -> None:
        self._key_type = key_type
        self._value_type = value_type
        super().__init__(initdict)

    def __setitem__(self, key: K, value: V) -> None:
        if not isinstance(key, self._key_type):
            raise TypeError(
                f'Invalid type for dictionary key: '
                f'expected "{self._key_type.__name__}", '
                f'got "{type(key).__name__}"'
            )
        if not isinstance(value, self._value_type):
            raise TypeError(
                f'Invalid type for dictionary value: '
                f'expected "{self._value_type.__name__}", '
                f'got "{type(value).__name__}"'
            )
        super().__setitem__(key, value)

(The above passes MyPy.)

Note, however, that adding type hints has no relevance at all to how this data structure works at runtime.


(2) Type-hinting dictionaries "for a flexible number of arbitrary keys"


I'm not quite sure what you mean by this, but if you want MyPy to raise an error if you add a string value to a dictionary you only want to have numeric values, you could do it like this:

from typing import SupportsFloat

d: dict[str, SupportsFloat] = {}
d['a'] = 5  # passes MyPy 
d['b'] = 4.67 # passes MyPy
d[5] = 6 # fails MyPy
d['baz'] = 'foo' # fails Mypy 

If you want MyPy static checks and runtime checks, your best bet (in my opinion) is to use the type-hinted version of TypeCheckedDict above:

d = TypeCheckedDict(str, SupportsFloat) # type: ignore[misc]
d['a'] = 5  # passes MyPy 
d['b'] = 4.67  # passes MyPy 
d[5] = 6  # fails Mypy 
d['baz'] = 'foo'  # fails Mypy

Mypy isn't too happy about us passing an abstract type in as a parameter to TypeCheckedDict.__init__, so you have to add a # type: ignore[misc] when instantiating the dict. (That feels like a MyPy bug to me.) Other than that, however, it works fine.

(See my previous answer for caveats about using SupportsFloat to hint numeric types. Use typing.Dict instead of dict for type-hinting if you're on Python <= 3.8.)


(3) Using typeguard


Since you're using typeguard, you could simplify the logic in my StrNumberDict class a little, like so:

from collections import UserDict
from typeguard import typechecked
from typing import SupportsFloat

class StrNumberDict(UserDict[str, SupportsFloat]):
    @typechecked
    def __setitem__(self, key: str, value: SupportsFloat) -> None:
        super().__setitem__(key, value)

However, I don't think there's a way of doing this with typeguard if you want to have a more generic TypeCheckedDict that can be instantiated with arbitrary type-checking. The following does not work:

### THIS DOES NOT WORK ###

from typing import TypeVar, SupportsFloat
from collections.abc import Hashable
from collections import UserDict
from typeguard import typechecked

K = TypeVar('K', bound=Hashable)
V = TypeVar('V')

class TypeCheckedDict(UserDict[K, V]):
    @typechecked
    def __setitem__(self, key: K, value: V) -> None:
        super().__setitem__(key, value)

d = TypeCheckedDict[str, SupportsFloat]()
d[5] = 'foo'  # typeguard raises no error here.

It may also be worth noting that typeguard is not currently maintained, so there is a certain amount of risk involved in using that particular library.

Alex Waygood
  • 6,304
  • 3
  • 24
  • 46
1

Good suggestions, but I observe that it gets much simpler than all that. @alex-waygood has some great explanations, and even if like me, you find his solution to be a tad overkill, his answer is much more correct.

class HomebrewTypeGuarding:
    """"Class mimics the primary interface of a dict, as an example
        of naive guarding using the __annotations__ collection that
        exists on any type-annotated object.
    """
    a: str
    b: int
    def __getitem__(self, key):
        return (
            self.__dict__[key] 
            if key in self.__dict__ 
            else None
        )
    def __setitem__(self, key, value):
        if (
            key in self.__annotations__
            and isinstance(value, self.__annotations__[key])
        ):
            self.__dict__[key] = value
        else:
            raise ValueError(
                "Incorrect type for value on {}:{} = {}".format(
                    key,
                    str(
                        self.__annotations__[key] 
                        if key in self.__annotations__
                        else None
                    ),
                    str(value)
                )
            )
        

The plumbing here is much simpler, but do note I'm not claiming this is a dict. My intention is

  1. To simply demonstrate type-guarding using the annotations collection, and
  2. To do so in a way that can extend much further beyond dicts.

If you're really just after a type-checked dict, well, @alex-waygood goes into much greater detail there, and his solution is actually complete and correct.

If you are more interested in runtime-typechecking, at property or attribute assignment, I believe this solution is superior, simply because it can actually happen reliably, for many types, where the more correct and complete option is either A, going to be copy-pasted into all your use-case implementations, or B, converted into a metaclass or decorator that modifies the getitem and setitem dunders, to be some form of the below.

Note that, for B, many possible usecases that have different requirements. @s-m-e specified dict, and this doesn't directly work as a wrapper/decorator for a dict, but could be readily adapted. I would prefer that you invoke the original getitem/setitem dunders on a decorated class/instance for that, instead of directly mutating the self.__dict__ or self collection. I was going for the simplest-possible example of using the annotations object to type-check. If write a class or instance decorator that installs (or overrides) a getitem/setitem pair, and if those dunders simply read/mutate self.__dict__, it breaks for dicts. If you observe this disclaimer, and write out the the super().__getitem__(in_key), then there's the edge-case with first-generation classes (e.g., some implementations of such a decorator still fail) and with multiple-inheritance, for the usual issues with inheritance traversal. By the time you're solving that, you're almost definitely using getattrs on the type, and exploring multiple parallel paths for the __getitem__ and __setitem__ methods or properties that are relevant.

Sam Hughes
  • 665
  • 8
  • 10