13

In python, concatenation of two sequences is typically done by the + operator. However, mypy complains about the following:

from typing import Sequence

def concat1(a: Sequence, b: Sequence) -> Sequence:
    return a + b

And it's right: Sequence has no __add__. However, the function works perfectly fine for the "usual" sequence types list, str, tuple. Obviously, there are other sequence types where it doesn't work (e.g. numpy.ndarray). A solution could be to following:

from itertools import chain

def concat2(a: Sequence, b: Sequence) -> Sequence:
    return list(chain(a, b))

Now, mypy doesn't complain. But concatenating strings or tuples always gives a list. There seems to be an easy fix:

def concat3(a: Sequence, b: Sequence) -> Sequence:
    T = type(a)
    return T(chain(a, b))

But now mypy is unhappy because the constructor for T get's too many arguments. Even worse, the function doesn't return a Sequence anymore, but it returns a generator.

What is the proper way of doing this? I feel that part of the issue is that a and b should have the same type and that the output will be the same type too, but the type annotations don't convey it.

Note: I am aware that concatenating strings is more efficiently done using ''.join(a, b). However, I picked this example more for illustration purposes.

Ingo
  • 1,103
  • 8
  • 17
  • I'm aware of that question. However, it doesn't address the typing issues associated with this. – Ingo Dec 18 '20 at 17:22
  • 2
    A starting point would be to define a generic function: `T = TypeVar('T', bound=Sequence); def concat(a: T, b: T) -> T: ...`. However, the issue is that just because a type is a sequence doesn't guarantee that it has a constructor which takes an iterable as an argument. It's true for `list`, `tuple`, etc, but does not need to be true in general. – chepner Dec 18 '20 at 17:39
  • 3
    What you are trying to do is plain impossible. There are many sequences which simply *cannot* be concatenated in any way. For example, ``range`` is a sequence type but cannot be meaningfully concatenated to create a new ``range`` – all but the most basic cases break the invariants of ``range``. As long as your goal is "concatenate arbitrary ``Sequence`` types", different approaches will just create different errors. MyPy is unhappy because what you are doing is wrong, and it is MyPy's job to tell you that. – MisterMiyagi Dec 20 '20 at 09:06
  • There is a deeper problem with your type hints for `concat1`. According to the type hints, the call `concat1('abc', ['d', 'e', 'f'])` is valid, but Python cannot concatenate sequences of different types. Unfortunately, typing in Python is becoming more problematic than helpful in a number of practical cases. – wstomv Apr 06 '21 at 12:42

2 Answers2

8

There is no general way to solve this: Sequence includes types which cannot be concatenated in a generic way. For example, there is no way to concatenate arbitrary range objects to create a new range and keep all elements.

One must decide on a concrete means of concatenation, and restrict the accepted types to those providing the required operations.

The simplest approach is for the function to only request the operations needed. In case the pre-built protocols in typing are not sufficient, one can fall back to define a custom typing.Protocol for the requested operations.


Since concat1/concat_add requires the + implementation, a Protocol with __add__ is needed. Also, since addition usually works on similar types, __add__ must be parameterized over the concrete type – otherwise, the Protocol asks for all addable types that can be added to all other addable types.

# TypeVar to parameterize for specific types
SA = TypeVar('SA', bound='SupportsAdd')


class SupportsAdd(Protocol):
    """Any type T where +(:T, :T) -> T"""
    def __add__(self: SA, other: SA) -> SA: ...


def concat_add(a: SA, b: SA) -> SA:
    return a + b

This is sufficient to type-safely concatenate the basic sequences, and reject mixed-type concatenation.

reveal_type(concat_add([1, 2, 3], [12, 17])) # note: Revealed type is 'builtins.list*[builtins.int]'
reveal_type(concat_add("abc", "xyz"))        # note: Revealed type is 'builtins.str*'
reveal_type(concat_add([1, 2, 3], "xyz"))    # error: ...

Be aware that this allows concatenating any type that implements __add__, for example int. If further restrictions are desired, define the Protocol more closely – for example by requiring __len__ and __getitem__.


Typing concatenation via chaining is a bit more complex, but follows the same approach: A Protocol defines the capabilities needed by the function, but in order to be type-safe the elements should be typed as well.

# TypeVar to parameterize for specific types and element types
C = TypeVar('C', bound='Chainable')
T = TypeVar('T', covariant=True)


# Parameterized by the element type T
class Chainable(Protocol[T]):
    """Any type C[T] where C[T](:Iterable[T]) -> C[T] and iter(:C[T]) -> Iterable[T]"""
    def __init__(self, items: Iterable[T]): ...

    def __iter__(self) -> Iterator[T]: ...


def concat_chain(a: C, b: C) -> C:
    T = type(a)
    return T(chain(a, b))

This is sufficient to type-safely concatenate sequences constructed from themselves, and reject mixed-type concatenation and non-sequences.

reveal_type(concat_chain([1, 2, 3], [12, 17])) # note: Revealed type is 'builtins.list*[builtins.int]'
reveal_type(concat_chain("abc", "xyz"))        # note: Revealed type is 'builtins.str*'
reveal_type(concat_chain([1, 2, 3], "xyz"))    # error: ...
reveal_type(concat_chain(1, 2))                # error: ...
MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119
  • 1
    Does anything speak against including Sequence as a mixin to SupportsAdd? – Josiah Dec 30 '20 at 00:43
  • 2
    @Josiah In theory that would be correct. However, that would be an intersection type [which mypy (or other checkers) does not support (yet)](https://github.com/python/typing/issues/213). Concretely, ``Sequence`` cannot be mixed with ``Protocol`` since ``Protocol`` can only be mixed/inherited with other ``Protocol``s; ``Sequence`` and others are different beasts, in part due to them actually being Abstract Base Classes. – MisterMiyagi Dec 30 '20 at 11:08
  • Ah. Somehow I got into my head that Sequence was a Protocol, but right you are. – Josiah Jan 01 '21 at 10:29
6

Sequence does not support add, so you cannot use sequence. Instead, use a TypeVar that is bound to the types that you allow, or use overloading. Overloading is more general than needed here (though you may disagree) but you can read about it here https://docs.python.org/3/library/typing.html#typing.overload. Let's just use a TypeVar

from typing import TypeVar

ConcatableSequence = TypeVar('ConcatableSequence ', list, str, tuple)

def concat1(a: ConcatableSequence, b: ConcatableSequence) -> ConcatableSequence:
    return a + b

Note here that when the type check runs, ConcatableSequence may be list, str, or tuple, but all three of a, b, and the return value must be the same choice, which differs from how Union would work.

mCoding
  • 4,059
  • 1
  • 5
  • 11
  • Agreed regarding overloading. I was hoping there would be a more general solution that doesn't need to explicitly list those three types. – Ingo Dec 18 '20 at 18:42