-1

I'm working with a JSON data structure and am trying to represent it as a dataclass. The data structure is (partly) circular and I want the nested data structures to be neatly represented as dataclasses as well.

I am having some trouble getting the dataclasses to parse correctly. See the simplified example below:

from typing import List, Optional, Union


class SchemaTypeName(Enum):
    LONG = "long"
    NULL = "null",
    RECORD = "record"
    STRING = "string"


@dataclass_json
@dataclass
class SchemaType():

    type: Union[
        SchemaTypeName,
        'SchemaType',
        List[
            Union[
                SchemaTypeName,
                'SchemaType'
            ]
        ]
    ]

    fields: Optional[List['SchemaType']] = None
    name: Optional[str] = None

Below is a printout of the object returned after calling from_dict with some sample data. Notice that the nested object (indicated with the arrow) is not parsed as a dataclass correctly.

SchemaType(
    type=[
        'null', 
------> {
            'fields': [
                {'name': 'id', 'type': 'string'}, 
                {'name': 'date', 'type': ['null', 'long']}, 
                {'name': 'name', 'type': ['null', 'string']}
            ],
            'type': 'record'
        }
    ]
)

Am I declaring the type hint for the type field incorrectly?

I'm using Python 3.9 with dataclasses_json==0.5.2 and marshmallow==3.11.1.

thijsfranck
  • 778
  • 1
  • 10
  • 24
  • Hmm, you have a few assumptions wrong with how the dataclass-json library works: 1) The constructor or `__init__` method isn't automatically replaced, so validation and data transformation won't work that way normally. My undestanding is you will need to use helper methods like `from_dict` instead. 2) You don't need to use both `@dataclass_json` decorator and the `DataClassJsonMixin` subclass; using both is redundant, at least from what I understand. – rv.kvetch Dec 13 '21 at 18:44
  • Correct, I use `from_dict` in the unit test that produced the printout I included in the question. I use the decorator to pass some additional parameters that I left out of the example. The `DataClassJsonMixin` helps `mypy` to pick up on the extra methods such as `from_dict`, it doesn't pick up on those with just the decorator. – thijsfranck Dec 14 '21 at 05:03
  • Thank you for your feedback, I have clarified the question accordingly. – thijsfranck Dec 14 '21 at 05:31
  • Hmm, are you certain that the outermost encountered value for `type` will always be a SchemaType or SchemaTypeName, and that all subsequent values for it (for ex. nested within `fields`) will only be a SchemaTypeName? If so, there might be a simple solution for it. – rv.kvetch Dec 14 '21 at 08:58
  • Within `fields` can be only be a list of `SchemaType` objects. Within `type` can exist a `SchemaTypeName`, `SchemeType`, or a `List` where both types can be mixed. I think this mix is causing difficulty during the parsing step. – thijsfranck Dec 14 '21 at 16:22
  • Right, but what i'm asking is if with the specific data you're working with, whether any inner `type` values are a `SchemaType` or a list of `SchemaType`; depending on that, you could probably model your data a bit slightly different. – rv.kvetch Dec 14 '21 at 16:28

1 Answers1

0

I found that the problem was related to dataclasses_json not decoding my elements correctly when they are in a list. Having mixed types in a list causes the decoder to return a list of basic strings and dicts, without transforming them to instances of SchemaType and SchemaTypeName.

However, dataclasses_json allows you to configure a custom decoder function for any particular field. This is done by importing the config function from dataclasses_json and providing it as the metadata keyword argument for field. Next, include the decoder function as the decoder keyword argument for config.

Please see the updated example below. Using the schemaTypeDecoder function, I am able to transform my data to the correct types.

from dataclasses import field
from dataclasses_json import config

class SchemaTypeName(Enum):
    ARRAY = "array"
    LONG = "long"
    NULL = "null"
    OBJECT = "object"
    RECORD = "record"
    STRING = "string"


def schemaTypeDecoder(data: Union[str, dict, List[Union[str, dict]]]):

    def transform(schemaType: Union[str, dict]):
        if isinstance(schemaType, str):
            return SchemaTypeName(schemaType)
        else:
            return SchemaType.from_dict(schemaType)

    if isinstance(data, list):
        return [transform(schemaType) for schemaType in data]
    else:
        return transform(data)


@dataclass_json()
@dataclass
class SchemaType():
    type: Union[
        SchemaTypeName,
        'SchemaType',
        List[
            Union[
                SchemaTypeName,
                'SchemaType'
            ]
        ]
    ] = field(
        metadata=config(
            decoder=schemaTypeDecoder
        )
    )

    fields: Optional[List['SchemaType']] = None
    name: Optional[str] = None
thijsfranck
  • 778
  • 1
  • 10
  • 24