When you traverse a nested Python data structure in order to convert it, you have to deal
with the possibility of self-reference, otherwise your code will get in an endless loop if the data is self-referential.
The way ruamel.yaml (and the standard library json.dump()
) deal with that is keeping
a list of id()
s of the collection objects (everything you
want to recurse into, so not primitives like int
, float
, str
) and if such an id()
is already
in the list represent, the first occurrence of that collection object as an anchor and the other occurrences as an alias, so you don't have to recurse again into the object ( json.dump()
tells you it cannot dump such
a structure, but at least it doesn't hang).
The same mechanism (keeping track of id()
s) is used in ruamel.yaml to not repeat the same collection when it is
referenced in multiple other collections.
pydantic doesn't seem to do that, hence the "written out" structure you get when calling library.dict()
.
I think that is the reason why in the documentation you are told to use a string
with a class name when
dumping pydanctic to JSON with self referential data
To get around this limitation of pydantic you could do two things:
write an alternative to .dict()
that returns a data structure that dumps to the required YAML document
format, which means it needs to return a structure with the same data (dict
) in multiple places.
make sure you can dump your classes directly using ruamel.yaml, so you don't have to convert them.
But for both of these to work it is required that the author that you add to book1
and book2
is the same after adding, and it is not.
You cannot safely assume that if two dicts have the same key/value pairs they are the same object
so any comparison will need to be done using is
and not using ==
.
After you pass in john_smith
to the two calls of Book()
, you don't have an attribute .author
that
points to the same data (i.e. has the same id()
):
from pydantic import BaseModel
from typing import List
class Author(BaseModel):
id: str
name: str
age: int
class Book(BaseModel):
id: str
title: str
author: Author
class Library(BaseModel):
authors: List[Author]
books: List[Book]
john_smith = Author(id="auth1", name="John Smith", age=42)
books = [
Book(id="book1", title="Some title", author=john_smith),
Book(id="book2", title="Another one", author=john_smith),
]
library = Library(authors=[john_smith], books=books)
print('same author?', john_smith is library.books[0].author)
print('same author?', library.books[0].author is library.books[1].author)
which gives:
same author? False
same author? False
What you can do is force the authors to be identical, and then use something smarter than pydantic's .dict()
:
import sys
import ruamel.yaml
def gen_data(d, id_map=None):
if id_map is None:
id_map = {}
d_id = id(d)
if d_id in id_map:
print('already found', id_map)
return id_map[d_id]
if isinstance(d, BaseModel):
ret_val = {}
for k, v in d:
if k == 'author':
print('auth', v, id(v))
ret_val[k] = gen_data(v, id_map)
elif isinstance(d, list):
ret_val = []
for elem in d:
ret_val.append(gen_data(elem, id_map))
else:
return d # should be primitive
id_map[d_id] = ret_val
return ret_val
# force authors to be the same
library.books[0].author = library.books[1].author = library.authors[0]
assert library.books[0].author is library.books[1].author
# alternative for .dict()
data = gen_data(library)
yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)
and that results in what you wanted:
auth id='auth1' name='John Smith' age=42 140494566559168
already found {140494566559168: {'id': 'auth1', 'name': 'John Smith', 'age': 42}, 140494576359168: [{'id': 'auth1', 'name': 'John Smith', 'age': 42}]}
auth id='auth1' name='John Smith' age=42 140494566559168
already found {140494566559168: {'id': 'auth1', 'name': 'John Smith', 'age': 42}, 140494576359168: [{'id': 'auth1', 'name': 'John Smith', 'age': 42}], 140494566559216: {'id': 'book1', 'title': 'Some title', 'author': {'id': 'auth1', 'name': 'John Smith', 'age': 42}}}
authors:
- &id001
id: auth1
name: John Smith
age: 42
books:
- id: book1
title: Some title
author: *id001
- id: book2
title: Another one
author: *id001
Please note that you shouldn't import yaml
, but instead intantiate a ruamel.yaml.YAML()
instance.
If necessary, in ruamel.yaml
it is possible to control the name of the anchor/alias to
something else than the id001
.