4

Let us say I have a custom use case, and I need to dynamically create or define the __init__ method for a dataclass.

For exampel, say I will need to decorate it like @dataclass(init=False) and then modify __init__() method to taking keyword arguments, like **kwargs. However, in the kwargs object, I only check for presence of known dataclass fields, and set these attributes accordingly (example below)

I would like to type hint to my IDE (PyCharm) that the modified __init__ only accepts listed dataclass fields as parameters or keyword arguments. I am unsure if there is a way to approach this, using typing library or otherwise. I know that PY3.11 has dataclass transforms planned, which may or may not do what I am looking for (my gut feeling is no).

Here is a sample code I was playing around with, which is a basic case which illustrates problem I am having:

from dataclasses import dataclass


# get value from input source (can be a file or anything else)
def get_value_from_src(_name: str, tp: type):
    return tp()  # dummy value


@dataclass
class MyClass:
    foo: str
    apple: int

    def __init__(self, **kwargs):
        for name, tp in self.__annotations__.items():
            if name in kwargs:
                value = kwargs[name]
            else:
                # here is where I would normally have the logic
                # to read the value from another input source
                value = get_value_from_src(name, tp)
                if value is None:
                    raise ValueError

            setattr(self, name, value)


c = MyClass(apple=None)
print(c)

c = MyClass(foo='bar',  # here, I would like to auto-complete the name
                        # when I start typing `apple`
            )
print(c)

If we assume that number or names of the fields are not fixed, I am curious if there could be a generic approach which would basically say to type checkers, "the __init__ of this class accepts only (optional) keyword arguments that match up on the fields defined in the dataclass itself".


Addendums, based on notes in comments below:

  • Passing @dataclass(kw_only=True) won't work because imagine I am writing this for a library, and need to support Python 3.7+. Also, kw_only has no effect when a custom __init__() is implemented, as in this case.

  • The above is just a stub __init__ method. it could have more complex logic, such as setting attributes based on a file source for example. basically the above is just a sample implementation of a larger use case.

  • I can't update each field to foo: Optional[str] = None because that part would be implemented in user code, which I would not have any control over. Also, annotating it in this way doesn't make sense when you know a custom __init__() method will be generated for you - meaning not by dataclasses. Lastly, setting a default for each field just so that the class can be instantiated without arguments, like MyClass(), don't seem like the best idea to me.

  • It would not work to let dataclasses auto-generate an __init__, and instead implement a __post_init__(). This would not work because I need to be able to construct the class without arguments, like MyClass(), as the field values will be set from another input source (think local file or elsewhere); this means that all fields would be required, so annotating them as Optional would be fallacious in this case. I still need to be able to support user to enter optional keyword arguments, but these **kwargs will always match up with dataclass field names, and so I desire some way for auto-completion to work with my IDE (PyCharm)

Hope this post clarifies the expectations and desired result. If there are any questions or anything that is a bit vague, please let me know.

rv.kvetch
  • 9,940
  • 3
  • 24
  • 53
  • If you're just doing this to have keyword-only arguments, use `@dataclass(kw_only=True)`. – user2357112 Sep 19 '22 at 20:20
  • @user2357112 I can't use `kw_only` for a couple different reasons. for ex, `kw_only` still makes all params as required, but i need them as optional. also, `kw_only` doesn't work if we dynamically generate an `__init__` method anyway. – rv.kvetch Sep 19 '22 at 20:24
  • 1
    Optionality is an entirely separate issue. If you want your parameters to be optional, you need to give your fields default values. For example, `foo` should be declared as `foo: Optional[str] = None` (with `Optional` imported from `typing`), not `foo: str`. – user2357112 Sep 19 '22 at 20:26
  • @user2357112 the `field = None` would not have any impact, as the dataclass decorator is not applying the `__init__` method. besides, it does not help at all with type hinting and autocompletion if I add it. – rv.kvetch Sep 19 '22 at 20:27
  • 2
    It looks like the only reason you're generating your own `__init__` is to replicate functionality that `dataclass` could provide for you if you didn't generate your own `__init__`. – user2357112 Sep 19 '22 at 20:28
  • This is just a stub `__init__` method. it could have more complex logic, such as setting attributes based on a file source for example. basically the above is just a sample implementation of a larger use case, but yes, unfortunately it wouldn't work for dataclasses to auto-generate an `__init__` method as here it contains custom logic for setting attributes. – rv.kvetch Sep 19 '22 at 20:29
  • 4
    Okay, but it sounds like most of that would be better handled by `__post_init__`. You can let `dataclass` generate an `__init__` for you and get all the IDE autocompletion benefits you're looking for. – user2357112 Sep 19 '22 at 20:32
  • I cannot, as I need to be able to construct the class with no arguments, for ex. like `MyClass()`. I also cannot update to make all fields as Optional either. – rv.kvetch Sep 19 '22 at 20:35
  • @user2357112 I updated question based on notes that were discussed, but essentially the custom `__init__` method would need to stay. – rv.kvetch Sep 21 '22 at 03:45
  • 3
    Look, I like re-inventing the wheel as much as the next guy, but aside from that, it seems to me that your requirements are just inconsistent. Even _if_ there was a way to magically dynamically annotate your `__init__` method as you want, the type annotations on your class attributes would still be wrong. `foo: str` means that `foo` is expected to be a string and **never** `None`. So those type hints are already wrong to begin with. Like it or not, `typing.Optional[str]` or `str | None` is the only correct way, if an instance's `foo` attribute can be both `str` and `None`. – Daniil Fajnberg Sep 22 '22 at 19:56
  • Good point, I think the simplified example I had was kind of inconsistent and giving the wrong idea. I’ll see if I can update it. – rv.kvetch Sep 22 '22 at 21:05
  • @DaniilFajnberg done – rv.kvetch Sep 22 '22 at 21:07

2 Answers2

5

What you are describing is impossible in theory and unlikely to be viable in practice.

TL;DR

Type checkers don't run your code, they just read it. A dynamic type annotation is a contradiction in terms.

Theory

As I am sure you know, the term static type checker is not coincidental. A static type checker is not executing the code your write. It just parses it and infers types according to it's own internal logic by applying certain rules to a graph that it derives from your code.

This is important because unlike some other languages, Python is dynamically typed, which as you know means that the type of a "thing" (variable) can completely change at any point. In general, there is theoretically no way of knowing the type of all variables in your code, without actually stepping through the entire algorithm, which is to say running the code.

As a silly but illustrative example, you could decide to put the name of a type into a text file to be read at runtime and then used to annotate some variable in your code. Could you do that with valid Python code and typing? Sure. But I think it is beyond clear, that static type checkers will never know the type of that variable.

Why your proposition won't work

Abstracting away all the dataclass stuff and the possible logic inside your __init__ method, what you are asking boils down to the following.

"I want to define a method (__init__), but the types of its parameters will only be known at runtime."

Why am I claiming that? I mean, you do annotate the types of the class' attributes, right? So there you have the types!

Sure, but these have -- in general -- nothing whatsoever to do with the arguments you could pass to the __init__ method, as you yourself point out. You want the __init__ method to accept arbitrary keyword-arguments. Yet you also want a static type checker to infer which types are allowed/expected there.

To connect the two (attribute types and method parameter types), you could of course write some kind of logic. You could even implement it in a way that enforces adherence to those types. That logic could read the type annotations of the class attributes, match up the **kwargs and raise TypeError if one of them doesn't match up. This is entirely possible and you almost implemented that already in your example code. But this only works at runtime!

Again, a static type checker has no way to infer that, especially since your desired class is supposed to just be a base class and any descendant can introduce its own attributes/types at any point.

But dataclasses work, don't they?

You could argue that this dynamic way of annotating the __init__ method works with dataclasses. So why are they so different? Why are they correctly inferred, but your proposed code can't?

The answer is, they aren't.

Even dataclasses don't have any magical way of telling a static type checker which parameter types the __init__ method is to expect, even though they do annotate them, when they dynamically construct the method in _init_fn.

The only reason mypy correctly infers those types, is because they implemented a separate plugin just for dataclasses. Meaning it works because they read through PEP 557 and hand-crafted a plugin for mypy that specifically facilitates type inference based on the rules described there.

You can see the magic happening in the DataclassTransformer.transform method. You cannot generalize this behavior to arbitrary code, which is why they had to write a whole plugin just for this.

I am not familiar enough with how PyCharm does its type checking, but I strongly suspect they used something similar.

So you could argue that dataclasses are "cheating" with regards to static type checking. Though I am certainly not complaining.

Pragmatic solution

Even something as "high-profile" as Pydantic, which I personally love and use extensively, requires its own mypy plugin to realize the __init__ type inference properly (see here). For PyCharm they have their own separate Pydantic plugin, without which the internal type checker cannot provide those nice auto-suggestions for initialization etc.

That approach would be your best bet, if you really want to take this further. Just be aware that this will be (in the best sense of the word) a hack to allow specifc type checkers to catch "errors" that they otherwise would have no way of catching.

The reason I argue that it is unlikely to be viable is because it will essentially blow up the amount of work for your project to also cover the specific hacks for those type checkers that you want to satisfy. If you are committed enough and have the resources, go for it.

Conclusion

I am not trying to discourage you. But it is important to know the limitations enforced by the environment. It's either dynamic types and hacky imperfect type checking (still love mypy), or static types and no "kwargs can be anything" behavior.

Hope this makes sense. Please let me know, if I made any errors. This is just based on my understanding of typing in Python.

Daniil Fajnberg
  • 12,753
  • 2
  • 10
  • 41
  • 1
    Thanks, this is a good post, a bit to read through but I def appreciate the write-up. It looks like building a custom plugin for my IDE, similar to how Pydantic does it, is probably the best option in my case for now. I was hoping to avoid a solution like this which seems like it could end up being a bit over-engineered, however I'm def up for the challenge, and whenever I get some free time I'll probably look into tackling writing my own custom plugin strictly for my autocompletion purposes. – rv.kvetch Sep 27 '22 at 15:03
  • @rv.kvetch Glad I could help. Sorry for rambling a bit. Good luck with your project. – Daniil Fajnberg Sep 30 '22 at 15:21
  • just formally accepted this answer now, and unfortunately I was a bit late in marking the answer, so looks like the full bounty might not have been awarded :-( – rv.kvetch Sep 30 '22 at 17:47
3

For

It would not work to let dataclasses auto-generate an __init__, and instead implement a __post_init__(). This would not work because I need to be able to construct the class without arguments, like MyClass(), as the field values will be set from another input source (think local file or elsewhere); this means that all fields would be required, so annotating them as Optional would be fallacious in this case. I still need to be able to support user to enter optional keyword arguments, but these **kwargs will always match up with dataclass field names, and so I desire some way for auto-completion to work with my IDE (PyCharm)

dataclasses.field + default_factory can be a solution.

But, it seems that dataclass field declarations are implemented in user code:

I can't update each field to foo: Optional[str] = None because that part would be implemented in user code, which I would not have any control over. Also, annotating it in this way doesn't make sense when you know a custom __init__() method will be generated for you - meaning not by dataclasses. Lastly, setting a default for each field just so that the class can be instantiated without arguments, like MyClass(), don't seem like the best idea to me.

If your IDE supports ParamSpec, there is a workaround: not correct(cannot pass static type checker), but has auto-completion:

from typing import Callable, Iterable, TypeVar, ParamSpec

from dataclasses import dataclass

T = TypeVar('T')
P = ParamSpec('P')

# user defined dataclass
@dataclass
class MyClass:
    foo: str
    apple: int


def wrap(factory: Callable[P, T], annotations: Iterable[tuple[str, type]]) -> Callable[P, T]:
    def default_factory(**kwargs):
        for name, type_ in annotations:
            kwargs.setdefault(name, type_())
        return factory(**kwargs)
    return default_factory

WrappedMyClass = wrap(MyClass, MyClass.__annotations__.items())
WrappedMyClass() # Okay

Demo

YouJiacheng
  • 449
  • 3
  • 11
  • The `ParamSpec` is a good idea, unfortunately it doesn't look like PyCharm supports it *yet* (I'm wondering if I should open a ticket for that, but hopefully they'll get around it in a future release). I don't see an autocomplete option similar to how you had in the screenshot above, however it's really neat that you do on your IDE at least. – rv.kvetch Sep 27 '22 at 14:59
  • 1
    FYI, my IDE is Visual Studio Code. – YouJiacheng Sep 27 '22 at 18:30