1

Building a Python library, I am using type hints to guarantee consistency over certain data representation. In particular, I am making use of Union (sum types) in a nested fashion to represent the different "flavor" a datum can take.

What I end up with so far is similar to the following example:

from typing import Union

MyNumberT = Union[float,int]
MyDataT = Union[str,MyNumber]

def my_data_to_string(datum: MyDataT) -> str:
    if isinstance(datum, float):
        return _my_number_to_string(datum)
    elif isinstance(datum, int):
        return _my_number_to_string(datum)
    elif isinstance(datum, str):
        return datum
    # assert_never omitted for simplicity

def _my_number_to_string(number: MyNumberT) -> str:
    return "%s" % number

Which type-checks fine using mypy.

Now, my real code is a bit more complex, and I need to perform some common operations on variables that are of type MyNumberT. In the example, this is simply highlighted by adapting the import and replacing my_data_to_string as in the following:

from typing import get_args, Union

[...]

def my_data_to_string(datum: MyDataT) -> str:
    if isinstance(datum, get_args(MyNumberT)):
        return _my_number_to_string(datum)
    elif isinstance(datum, str):
        return datum
    # assert_never omitted for simplicity

[...]

On which the type-checking of mypy fails: Argument 1 to "_my_number_to_string" has incompatible type "Union[str, Union[float, int]]"; expected "Union[float, int]" .

I expected mypy to "realise" that in the first branch, datum could only be of type float or int, but the error message indicates it's not the case...

How could I achieve some pattern matching over "parts" of such nested types?

Pamplemousse
  • 154
  • 1
  • 13

2 Answers2

0

Your use-case is a great example to use a utility provided by functools called singledispatch. It allows you to define multiple functionality to a single function based on type of input.

from functools import singledispatch

# This class defines the function with
# a base case if the input type doesn't match
@singledispatch
def my_data_to_string(datum) -> str:
    raise TypeError(f"unsupported format: {type(datum)}")

# Registering for type str using type hint
@my_data_to_string.register
def _(datum: str):
    return datum

# Registering for multiple 
# types using decorator
@my_data_to_string.register(float)
@my_data_to_string.register(int)
def _(datum):
    return "<%s>" % datum


print(my_data_to_string("a"))    # a
print(my_data_to_string(1))      # <1>
print(my_data_to_string(1.5))    # <1.5>
print(my_data_to_string([1, 2])) # TypeError

It is extensible, readable and doesn't generate error in linters/formatters. Docs link.

AnkurSaxena
  • 825
  • 7
  • 11
  • While certainly a valid approach to the underlying problem, this does not seem to actually answer the question. – MisterMiyagi Jan 27 '21 at 10:04
  • Interesting approach indeed. However, I see two problems: 1) My code having more types in the `Union` and more functions to work on it (hence more pattern matches), this syntax would imply a lot of decorators, as well as many extra function declarations; 2) More importantly, it asks the writer of the function to know about all the internal details of the `Union`, to treat each "end" type in the decorator, which kinda goes against the idea of working with "high level" data types. – Pamplemousse Jan 27 '21 at 15:52
0

Starting with Python 3.10, unions are valid for isinstance checks:

def my_data_to_string(datum: MyDataT) -> str:
    if isinstance(datum, MyNumberT):
        return _my_number_to_string(datum)
    elif isinstance(datum, str):
        return datum
    # assert_never omitted for simplicity

As long as it is sufficient to exclude one constituent of the union, reversing the check works without requirements:

def my_data_to_string(datum: MyDataT) -> str:
    if isinstance(datum, str):  # handle explicit type first
        return datum
    else:  # catch-all for remaining types
        return _my_number_to_string(datum)
    # rely on type checker for safety!

Notice that this uses an else instead of an elif clause – rely on the type checker to reject incorrectly typed arguments.


For more complex types, you can build a type guard:

def guard_mnt(arg: MyDataT) -> Union[Literal[False], Tuple[MyNumberT]]:
    return (arg,) if isinstance(arg, get_args(MyNumberT)) else False  # type: ignore

This tells a type checker that it will either return the desired type wrapped or something false. The type: ignore is required since it uses the same type check implementation; the function serves as add a valid static type check around the unsupported runtime check.

It can be used via assignment expressions and unpacking:

def my_data_to_string(datum: MyDataT) -> str:
    if nums := guard_mnt(datum):  # only enter branch if guard is not False
        return _my_number_to_string(*datum)
    elif isinstance(datum, str):
        return datum
    # assert_never omitted for simplicity
Pamplemousse
  • 154
  • 1
  • 13
MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119
  • The type guard looks like an interesting idea. Although, there are two things I don't understand: 1) Why does the `# type: ignore` need to be present? 2) Why `arg` gets put inside a tuple... Could it be returned directly (as a `MyNumberT`)? Also, the `assert_never` mechanism seems broken by it (`"assert_never" has incompatible type "Union[float,int]"; expected "NoReturn"` - which corresponds to `MyNumberT`...). – Pamplemousse Jan 27 '21 at 20:50
  • The ``#type: ignore`` is needed because the guard still uses the same mechanism as in the question (``isinstance`` + ``get_args``) which the type checker does not actually understand, and thus assumes ``(arg,)`` is a ``Tuple[MydataT]``. The tuple is needed to ensure the result is true if the type is matched; both ``0`` and ``0.0`` are false, but ``(0,)`` and ``(0.0,)`` are true. – MisterMiyagi Jan 27 '21 at 21:24