10

I have a Python class, with a field which can be passed one of several sequence types. To simplify I'll stick with tuples and lists. __init__ converts the parameter to MyList.

from typing import Union
from dataclasses import dataclass, InitVar, field

class MyList(list):
    pass

@dataclass
class Struct:
    field: Union[tuple, list, MyList]

    def __post_init__(self):
        self.field = MyList(self.field)

What type should I use for the field declaration?

  • If I supply a union of all possible input types, the code does not document that field is always a MyList when accessed.
  • If I only supply the final MyList type, PyCharm complains when I pass Struct() a list.

I could instead use:

_field: InitVar[Union[tuple, list, MyList]] = None
field: MyList = field(init=False)

def __post_init__(self, _field):
    self.field = MyList(_field)

but this is tremendously ugly, especially when repeated across 3 fields. Additionally I have to construct a struct like Struct(_field=field) instead of Struct(field=field).

In April 2018, "tm" commented on this issue on PyCharm's announcement: https://blog.jetbrains.com/pycharm/2018/04/python-37-introducing-data-class/#comment-323957

nyanpasu64
  • 2,805
  • 2
  • 23
  • 31
  • [This discussion on github](https://github.com/ericvsmith/dataclasses/issues/60) touches on the issue of converters, which seems to me what you're asking for, and why they are not part of dataclasses. Your current implementation with `InitVar` is the intended solution for your scenario. – Arne Oct 23 '18 at 06:52
  • One of my use cases for converters is converting user input (via YAML files) to a type-safe enum for my configuration struct. Maybe I'll accomplish this using a function which wraps the constructor and converts the input. As for my other usecase of converting ndarray to MyArray, I'll have to review my code to find a solution I like. – nyanpasu64 Oct 23 '18 at 07:09
  • I see. Well, one alternative would then be to switch to [attrs](https://www.attrs.org/en/stable/), which supports converters. And unrelated, but since you brought it up, `yaml.load` is unsafe for user input. Unless you can trust your users, you should use `yaml.safe_load`. – Arne Oct 23 '18 at 07:46
  • Oh yeah I use ruamel.yaml, I believe all loaders (including roundtrip) are safe except 'unsafe'. – nyanpasu64 Oct 23 '18 at 07:58
  • *Just don't use a dataclass*. – juanpa.arrivillaga Aug 10 '21 at 21:17

3 Answers3

6

You are conflating assigning a value to the attribute with the code that produces the value to assign to the attribute. I would use a separate class method to keep the two pieces of code separate.

from dataclasses import dataclass


class MyList(list):
    pass


@dataclass
class Struct:
    field: MyList

    @classmethod
    def from_iterable(cls, x):
        return cls(MyList(x))


s1 = Struct(MyList([1,2,3]))
s2 = Struct.from_iterable((4,5,6))

Now, you only pass an existing value of MyList to Struct.__init__. Tuples, lists, and whatever else MyList can accept are passed to Struct.from_iterable instead, which will take care of constructing the MyList instance to pass to Struct.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • This is the approach that Rust takes, and if I were designing a Python program from scratch, I may attempt this approach. – nyanpasu64 Aug 17 '21 at 15:21
0

Have you tried a Pydantic BaseModel instead of a dataclass?

With the following code, my Pycharm does not complain:

from pydantic import BaseModel


class MyList(list):
    pass


class PydanticStruct(BaseModel):
    field: MyList

    def __post_init__(self):
        self.field = MyList(self.field)


a = PydanticStruct(field=['a', 'b'])
tlouarn
  • 161
  • 5
  • I'm not actively developing new Python code or this project anymore, but I'll look into Pydantic if I ever do work on Python again. – nyanpasu64 Aug 11 '21 at 05:35
0

dataclasses works best in straight-forward data containers, advanced utilities like conversion were consciously ommitted (see here for a complete writeup of this and similar features). Implementing this is a fair bit of work, since it should also include the pycharm plugin that notices in how far conversion would be supported now.

A much better approach would be to use one of the 3rd party that already did this, the most popular one being pydantic, probably because it has the easiest migration for dataclasses.


A native pydantic solution could look like this, where the conversion code is part of MyList. Handling it that way makes the __post_init__ unnecessary, leading to cleaner model definitions:

import pydantic


class MyList(list):
    @classmethod
    def __get_validators__(cls):
        """Validators handle data validation, as well as data conversion.

        This function yields validator functions, with the last-yielded
        result being the final value of a pydantic field annotated with
        this class's type.
        Since we inherit from 'list', our constructor already supports
        building 'MyList' instances from iterables - if we didn't, we 
        would need to write that code by hand and yield it instead.
        """
        yield cls


class Struct(pydantic.BaseModel):
    field: MyList  # accepts any iterable as input


print(Struct(field=(1, 2, 3)))
# prints: field=[1, 2, 3]
Arne
  • 17,706
  • 5
  • 83
  • 99