4

When iterating over a heterogeneous sequence (containing elements of type T1 and T2, say), mypy infers the target variable to have type object (or another base type shared between T1 and T2, e.g. float if the elements were 1 and 1.2):

xs = [1, "1"]
for x in xs:
    reveal_type(x)  # note: Revealed type is 'builtins.object*'

Wouldn't it make more sense for the inferred type to be Union[T1, T2]? Then if both T1 and T2 have some common attribute which the common base class lacks, the loop body would be allowed to access that attribute without irritating casts or isinstance assertions.

Why does mypy infer a single shared base type instead of a Union here?

ash
  • 5,139
  • 2
  • 27
  • 39

1 Answers1

5

Picking the common base class of the list elements (picking the join) instead of taking the union of the elements is a deliberate design choice that mypy made.

In short, the problem is that no matter which of the two solutions you pick, you'll always end up with edge cases that end up being inconvenient for somebody. For example, inferring the union would be inconvenient in cases like the following where you want to modify or add to the list, instead of only reading it:

class Parent: pass
class Child1(Parent): pass
class Child2(Parent): pass
class Child3(Parent): pass

# If foo is inferred to be type List[Union[Child1, Child2]] instead of List[Parent]
foo = [Child1(), Child2()]

# ...then this will fail with a type error, which is annoying.
foo.append(Child3())

It's possible that mypy could perhaps try applying some clever heuristic to determine whether it should infer a join or a union, but that'll probably end up being fairly confusing and difficult to predict for end-users.

This is also a pretty easy issue to work around in practice -- for example, you could just add an explicit annotation to your variable:

from typing import Union, Sized, List

# If you want the union
xs: List[Union[int, str]] = [1, "1"]

# If you want any object with the `__len__` method
ys: List[Sized] = [1, "1"]

So given these two factors, implementing some fancy heuristic or switching to inferring unions entirely (and disrupting a lot of existing code) doesn't really seem worth it.

Michael0x2a
  • 58,192
  • 30
  • 175
  • 224
  • Great answer, as usual. I always thought the behaviour was similar to Java's generics, e.g. the untyped `List` is alias for `List`, it suprised me that `mypy` picks the common ancestor class instead - thanks for pointing that out. – hoefling Aug 12 '19 at 19:52
  • 2
    @hoefling -- You're close -- the type `List` is actually an alias for `List[Any]`, where `Any` is the dynamic type. (The mypy docs has more info on `Any` vs `object` [here](https://mypy.readthedocs.io/en/latest/kinds_of_types.html#the-any-type) and [here](https://mypy.readthedocs.io/en/latest/dynamic_typing.html#dynamic-typing)). However, you actually never wrote `List` or any other type hint in your example -- this aliasing is irrelevant. Instead, the type checker is responsible for picking what type `xs` should have. And mypy generally biases towards inferring concrete, non-dynamic types. – Michael0x2a Aug 12 '19 at 20:34
  • 2
    It's worth noting this is a mypy-specific decision though -- PEP 484 doesn't actually mandate any particular inference strategy, so it'd be just as valid for a type checker to decide `xs` is type `List[Any]` or `List[Union[int, str]]` instead. For example, Facebook's [pyre](https://github.com/facebook/pyre-check) made the opposite decision: they bias towards inferring unions instead of joins. This is a general pattern for PEP 484/typing PEPs -- they pin down in detail what a specific type hint means, but leaves the actual inference/usage of those type hints up to individual type checkers. – Michael0x2a Aug 12 '19 at 20:36