5

We have a number of dataclasses representing various results with common ancestor Result. Each result then provides its data using its own subclass of ResultData. But we have trouble to annotate the case properly.

We came up with following solution:

from dataclasses import dataclass
from typing import ClassVar, Generic, Optional, Sequence, Type, TypeVar


class ResultData:
    ...


T = TypeVar('T', bound=ResultData)


@dataclass
class Result(Generic[T]):
    _data_cls: ClassVar[Type[T]]
    data: Sequence[T]

    @classmethod
    def parse(cls, ...) -> T:
        self = cls()
        self.data = [self._data_cls.parse(...)]
        return self

class FooResultData(ResultData):
    ...

class FooResult(Result):
    _data_cls = FooResultData

but it stopped working lately with mypy error ClassVar cannot contain type variables [misc]. It is also against PEP 526, see https://www.python.org/dev/peps/pep-0526/#class-and-instance-variable-annotations, which we missed earlier.

Is there a way to annotate this case properly?

Braiam
  • 1
  • 11
  • 47
  • 78
ziima
  • 706
  • 4
  • 17
  • why not `class FooResult(Result[FooResultData]):` and drop `_data_cls` entirely? Perhaps I've missed something that `_data_cls` does. Btw I think using `class FooResult(Result):` means your class isn't fully typed - it's equivalent to `class FooResult(Result[Any]):` – joel Jan 26 '22 at 14:04
  • @joel: I forgot to mention, the `_data_cls` is actually used. I've improved the example to be more specific. I can't quite drop it. – ziima Jan 31 '22 at 07:57

2 Answers2

2

As hinted in the comments, the _data_cls attribute could be removed, assuming that it's being used for type hinting purposes. The correct way to annotate a Generic class defined like class MyClass[Generic[T]) is to use MyClass[MyType] in the type annotations.

For example, hopefully the below works in mypy. I only tested in Pycharm and it seems to infer the type well enough at least.

from dataclasses import dataclass
from functools import cached_property
from typing import Generic, Sequence, TypeVar, Any, Type


T = TypeVar('T', bound='ResultData')


class ResultData:
    ...


@dataclass
class Result(Generic[T]):
    data: Sequence[T]

    @cached_property
    def data_cls(self) -> Type[T]:
        """Get generic type arg to Generic[T] using `__orig_class__` attribute"""
        # noinspection PyUnresolvedReferences
        return self.__orig_class__.__args__[0]

    def parse(self):
        print(self.data_cls)


@dataclass
class FooResultData(ResultData):
    # can be removed
    this_is_a_test: Any = 'testing'


class AnotherResultData(ResultData): ...


# indicates `data` is a list of `FooResultData` objects
FooResult = Result[FooResultData]

# indicates `data` is a list of `AnotherResultData` objects
AnotherResult = Result[AnotherResultData]

f: FooResult = FooResult([FooResultData()])
f.parse()
_ = f.data[0].this_is_a_test  # no warnings

f: AnotherResult = AnotherResult([AnotherResultData()])
f.parse()

Output:

<class '__main__.FooResultData'>
<class '__main__.AnotherResultData'>

And of course, here is proof that it seems to be working on my end:

enter image description here

rv.kvetch
  • 9,940
  • 3
  • 24
  • 53
  • I can't just drop the `_data_cls`, since I used it in implementation. I improved the example. – ziima Jan 31 '22 at 07:59
  • 1
    I updated the example code, but looks like you'll have to rewrite the way you're creating the generic type unfortunately. For ex. with the syntax `class FooResult(Result[FooResultData])`, you lose the generic argument `T` passed in to `Result[T]`, so there's no way AFAIK to access it. Though, it works if you define `FooResult` as like a generic type alias, as shown in the example. The one drawback is you'll have to explicitly annotate the type in usage, since it seems IDEs like Pycharm anyway aren't smart (or advanced?) enough to figure it out. – rv.kvetch Jan 31 '22 at 15:07
  • 1
    Thanks for the help, I discarded the property, but I fixed generic class annotations. – ziima Feb 03 '22 at 09:10
1

At the end I just replaced the variable in _data_cls annotation with the base class and fixed the annotation of subclasses as noted by @rv.kvetch in his answer.

The downside is the need to define the result class twice in every subclass, but in my opinion it is more legible than extracting the class in property.

The complete solution:

from dataclasses import dataclass
from typing import ClassVar, Generic, Optional, Sequence, Type, TypeVar


class ResultData:
    ...


T = TypeVar('T', bound=ResultData)


@dataclass
class Result(Generic[T]):
    _data_cls: ClassVar[Type[ResultData]]  # Fixed annotation here
    data: Sequence[T]

    @classmethod
    def parse(cls, ...) -> T:
        self = cls()
        self.data = [self._data_cls.parse(...)]
        return self

class FooResultData(ResultData):
    ...

class FooResult(Result[FooResultData]):  # Fixed annotation here
    _data_cls = FooResultData
ziima
  • 706
  • 4
  • 17