Unfortunately, the pyyaml
documentation is just horrendous, so seemingly elemental things like customizing (de-)serialization are a pain to figure out properly. But there are essentially two ways you could solve this.
You had the right right idea of subclassing AnyUrl
, but the __repr__
method is irrelevant for YAML serialization. For that you need to do three things:
- Inherit from
YAMLObject
,
- define a custom
yaml_tag
, and
- override the
to_yaml
classmethod.
Then pyyaml
will serialize this custom class (that inherits from both AnyUrl
and YAMLObject
) in accordance with what you define in to_yaml
.
The to_yaml
method always receives exactly two arguments:
- A
yaml.Dumper
instance with built-in capabilities to serialize standard types (via methods like represent_str
for example) and
- the actual data to be serialized.
To avoid adding/overriding additional methods, you can leverage the fact that AnyUrl
inherits from string and the underlying str.__new__
method actually receives the full URL during construction. Therefore the str.__str__
method will return that "as is".
from pydantic import AnyUrl, BaseModel
from yaml import Dumper, ScalarNode, YAMLObject, dump, safe_load
class Url(AnyUrl, YAMLObject):
yaml_tag = "!Url"
@classmethod
def to_yaml(cls, dumper: Dumper, data: str) -> ScalarNode:
return dumper.represent_str(str.__str__(data))
class MyModel(BaseModel):
foo: int = 0
url: Url
obj = MyModel.parse_obj({"url": "https://www.example.com"})
print(obj)
serialized = dump(obj.dict()).strip()
print(serialized)
deserialized = MyModel.parse_obj(safe_load(serialized))
print(deserialized == obj and isinstance(deserialized.url, Url))
Output:
foo=0 url=Url('https://www.example.com', scheme='https', host='www.example.com', tld='com', host_type='domain')
foo: 0
url: https://www.example.com
True
Option B: Register a representer function for AnyUrl
You can avoid defining your own subclass and instead globally register a function that defines how instances of AnyUrl
should be serialized, by using the yaml.add_representer
function.
That function takes two mandatory arguments:
- The class for which you want to define your custom serialization behavior and
- the representer function that defines that serialization behavior.
The representer function essentially has to have the same signature as the YAMLObject.to_yaml
classmethod presented in option A, i.e. it takes a Dumper
instance and the data to be serialized as arguments.
from pydantic import AnyUrl, BaseModel
from yaml import Dumper, ScalarNode, add_representer, dump, safe_load
def url_representer(dumper: Dumper, data: AnyUrl) -> ScalarNode:
return dumper.represent_str(str.__str__(data))
add_representer(AnyUrl, url_representer)
class MyModel(BaseModel):
foo: int = 0
url: AnyUrl
obj = MyModel.parse_obj({"url": "https://www.example.com"})
print(obj)
serialized = dump(obj.dict()).strip()
print(serialized)
deserialized = MyModel.parse_obj(safe_load(serialized))
print(deserialized == obj and isinstance(deserialized.url, AnyUrl))
Output is the same as with the code from option A.
The benefit of this approach is that it involves less code and potential namespace collisions between the two parent classes in option A.
A potential drawback is that it modifies a global setting for the entire runtime of the program, which can become less transparent, if your application becomes large and is just something to be aware of, in case you decide you want to serialize AnyUrl
objects differently at some point.