3

Suppose I have class hierarchy like this:

class SerializableWidget(object):
# some code

class WidgetA(SerilizableWidget):
# some code

class WidgetB(SerilizableWidget):
# some code

I want to be able to serialize instances of WidgetA and WidgetB (and potentially other widgets) to text files as json. Then, I want to be able to deserialize those, without knowing beforehand their specific class:

some_widget = deserielize_from_file(file_path) # pseudocode, doesn't have to be exactly a method like this

and some_widget needs to be constructed from the precise subclass of SerilizableWidget. How do I do this? What methods exactly do I need to override/implement in each of the classes of my hierarchy?

Assume all fields of the above classes are primitive types. How do I override some __to_json__ and __from_json__ methods, something like that?

Baron Yugovich
  • 3,843
  • 12
  • 48
  • 76
  • Who thinks this is too broad and downvoted me? How can I make it more specific? It's not productive! – Baron Yugovich Aug 22 '18 at 21:53
  • Have you looked at libraries like [Marshmallow](https://marshmallow.readthedocs.io/en/3.0/) for this? – ChrisGPT was on strike Aug 22 '18 at 21:54
  • Why would I need an extra library, can I not call json.dumps() and json.load() directly? All I should need to do is implement the right methods in my classes. What am I missing? – Baron Yugovich Aug 22 '18 at 21:55
  • You don't strictly _need_ a third-party library, but serialization is more complicated than you appear to think (hence the existence of Marshmallow and other similar libraries). I suggest looking at the link I provided and making your own decision. For example, Marshmallow wants you to define schemas and Django REST Framework wants you to build serializers. Why do you think both of those make you do manual work? – ChrisGPT was on strike Aug 22 '18 at 21:57
  • Do you have any idea how I would do it in the setup described above? Assuming all fields of those classes are primitive types. Thanks – Baron Yugovich Aug 22 '18 at 21:59
  • Can I use jsonpickle for this? What are the pros and cons of using marchmallow vs jsonpickle? – Baron Yugovich Aug 22 '18 at 22:04
  • You _can_ use `jsonpickle` for this if you want. The advantage of `jsonpickle` is that you can use it for a wide variety of classes, even classes that do various weird things (almost anything that works with `pickle`), and get something that's somewhat human-readable and can be passed around as JSON. Whether that's what you want or not is a different story. If your classes are rigidly defined according to some set of rules (e.g., they're all `@dataclass`es, and equality is fully defined by the dataclass fields), you can write something a lot simpler/safer. – abarnert Aug 22 '18 at 22:17

2 Answers2

4

You can solve this with many methods. One example is to use the object_hook and default parameters to json.load and json.dump respectively.

All you need is to store the class together with the serialized version of the object, then when loading you have to use a mapping of which class goes with which name.

The example below uses a dispatcher class decorator to store the class name and object when serializing, and look it up later when deserializing. All you need is a _as_dict method on each class to convert the data to a dict:

import json

@dispatcher
class Parent(object):
    def __init__(self, name):
        self.name = name

    def _as_dict(self):
        return {'name': self.name}


@dispatcher
class Child1(Parent):
    def __init__(self, name, n=0):
        super().__init__(name)
        self.n = n

    def _as_dict(self):
        d = super()._as_dict()
        d['n'] = self.n
        return d

@dispatcher
class Child2(Parent):
    def __init__(self, name, k='ok'):
        super().__init__(name)
        self.k = k

    def _as_dict(self):
        d = super()._as_dict()
        d['k'] = self.k
        return d

Now for the tests. First lets create a list with 3 objects of different types.

>>> obj = [Parent('foo'), Child1('bar', 15), Child2('baz', 'works')]

Serializing it will yield the data with the class name in each object:

>>> s = json.dumps(obj, default=dispatcher.encoder_default)
>>> print(s)
[
  {"__class__": "Parent", "name": "foo"},
  {"__class__": "Child1", "name": "bar", "n": 15},
  {"__class__": "Child2", "name": "baz", "k": "works"}
]

And loading it back generates the correct objects:

obj2 = json.loads(s, object_hook=dispatcher.decoder_hook)
print(obj2)
[
  <__main__.Parent object at 0x7fb6cd561cf8>, 
  <__main__.Child1 object at 0x7fb6cd561d68>,
  <__main__.Child2 object at 0x7fb6cd561e10>
]

Finally, here's the implementation of dispatcher:

class _Dispatcher:
    def __init__(self, classname_key='__class__'):
        self._key = classname_key
        self._classes = {} # to keep a reference to the classes used

    def __call__(self, class_): # decorate a class
        self._classes[class_.__name__] = class_
        return class_

    def decoder_hook(self, d):
        classname = d.pop(self._key, None)
        if classname:
            return self._classes[classname](**d)
        return d

    def encoder_default(self, obj):
        d = obj._as_dict()
        d[self._key] = type(obj).__name__
        return d
dispatcher = _Dispatcher()
nosklo
  • 217,122
  • 57
  • 293
  • 297
  • Thanks, exactly what I needed! Couple follow ups: 1) Is dispatcher needed? 2) Can I achieve the same with jsonpickle, what are the pros and cons? – Baron Yugovich Aug 22 '18 at 23:03
  • 1
    @BaronYugovich `dispatcher ` is just one of possible methods. I like to use it because it makes clear which classes are added and why - it limits what the json file can do. You decorate the classes you want, and nobody can handcraft a json file with classes I didn't plan to "expose". I don't have any experience with `jsonpickle`, the reason I don't use it on my projects is that it is "too magic"... I like explicit things, like defining which attributes I want to serialize and how I want them in the json file. That said, `jsonpickle` probably uses some similar approach to this example. – nosklo Aug 23 '18 at 15:05
  • @user2880391 `from some_module import dispatcher` if you put it in `some_module.py` – nosklo May 28 '19 at 13:11
0

I really liked @nosklo's answer, but I wanted to customize what the property value was for how the model type got saved, so I extended his code a little to add a sub-annotation.

(I know this isn't directly related to the question, but you can use this to serialize to json too since it produces dict objects. Note that your base class must use the @dataclass annotation to serialize correctly - otherwise you could adjust this code to define the __as_dict__ method like @nosklo's answer)

data.csv:

model_type, prop1
sub1, testfor1
sub2, testfor2

test.py:

import csv
from abc import ABC
from dataclasses import dataclass

from polymorphic import polymorphic


@polymorphic(keyname="model_type")
@dataclass
class BaseModel(ABC):
    prop1: str


@polymorphic.subtype_when_(keyval="sub1")
class SubModel1(BaseModel):
    pass


@polymorphic.subtype_when_(keyval="sub2")
class SubModel2(BaseModel):
    pass


with open('data.csv') as csvfile:
    reader = csv.DictReader(csvfile, skipinitialspace=True)
    for row_data_dict in reader:
        price_req = BaseModel.deserialize(row_data_dict)
        print(price_req, '\n\tre-serialized: ', price_req.serialize())

polymorphic.py:

import dataclasses
import functools
from abc import ABC
from typing import Type


# https://stackoverflow.com/a/51976115
class _Polymorphic:
    def __init__(self, keyname='__class__'):
        self._key = keyname
        self._class_mapping = {}

    def __call__(self, abc: Type[ABC]):
        functools.update_wrapper(self, abc)
        setattr(abc, '_register_subtype', self._register_subtype)
        setattr(abc, 'serialize', lambda self_subclass: self.serialize(self_subclass))
        setattr(abc, 'deserialize', self.deserialize)
        return abc

    def _register_subtype(self, keyval, cls):
        self._class_mapping[keyval] = cls

    def serialize(self, self_subclass) -> dict:
        my_dict = dataclasses.asdict(self_subclass)
        my_dict[self._key] = next(keyval for keyval, cls in self._class_mapping.items() if cls == type(self_subclass))
        return my_dict

    def deserialize(self, data: dict):
        classname = data.pop(self._key, None)
        if classname:
            return self._class_mapping[classname](**data)
        raise ValueError(f'Invalid data: {self._key} was not found or it referred to an unrecognized class')

    @staticmethod
    def subtype_when_(*, keyval: str):
        def register_subtype_for(_cls: _Polymorphic):
            nonlocal keyval
            if not keyval:
                keyval = _cls.__name__
            _cls._register_subtype(keyval, _cls)

            @functools.wraps(_cls)
            def construct_original_subclass(*args, **kwargs):
                return _cls(*args, **kwargs)

            return construct_original_subclass

        return register_subtype_for


polymorphic = _Polymorphic

Sample console output:

SubModel1(prop1='testfor1') 
    re-serialized:  {'prop1': 'testfor1', 'model_type': 'sub1'}
SubModel2(prop1='testfor2') 
    re-serialized:  {'prop1': 'testfor2', 'model_type': 'sub2'}
takanuva15
  • 1,286
  • 1
  • 15
  • 27