5

PyYAML can handle cyclic graphs in regular python objects. For example:

Snippet #1.

class Node: pass
a = Node()
b = Node()
a.child = b
b.child = a
# We now have the cycle a->b->a
serialized_object  = yaml.dump(a)
object = yaml.load(serialized_object)

This code succeeds, so clearly there's some mechanism to prevent infinite recursion when loading the serialized object. How do I harness that when I write my own YAML constructor function?

For example, say Node is a class with transient fields foo and bar, and intransient field child. Only child should make it into the yaml document. I would hope to do this:

Snippet #2.

def representer(dumper, node):
  return dumper.represent_mapping("!node", {"child": node.child})

def constructor(loader, data):
  result = Node()
  mapping = loader.construct_mapping(data)
  result.child = mapping["child"]
  return result

yaml.add_representer(Node, representer)
yaml.add_constructor("!node", constructor)

# Retry object cycle a->b->a from earlier code snippet
serialized_object  = yaml.dump(a)
print serialized_object
object = yaml.load(serialized_object)

But it fails:

&id001 !node
child: !node
  child: *id001

yaml.constructor.ConstructorError: found unconstructable recursive node:
  in "<string>", line 1, column 1:
    &id001 !node

I see why. My constructor function isn't built for recursion. It needs to return the child object before it finishes constructing the parent object, and that fails when the child and parent are the same object.

But clearly PyYAML has graph traversals that solve this problem, because Snippet #1 works. Maybe there's one pass to construct all the objects and a second pass to populate their fields. My question is, how can my custom constructor tie into those mechanisms?

An answer to that question would be ideal. But if the answer is that I can't do this with custom constructors, and there is a less desirable alternative (e.g. mixing the YAMLObject class into my Node class), then that answer would be appreciated too.

smci
  • 32,567
  • 20
  • 113
  • 146
Travis Wilson
  • 949
  • 9
  • 19

2 Answers2

11

For complex types, that might involve recursion (mapping/dict, sequence/list, objects), the constructor cannot create the object in one go. You should therefore yield the constructed object in the constructor() function, and then update any values after that¹:

def constructor(loader, data):
    result = Node()
    yield result
    mapping = loader.construct_mapping(data)
    result.child = mapping["child"]

that gets rid of the error.

¹ I don't think this is documented anywhere, without me looking at py/constructor.py intensively, while upgrading PyYAML to ruamel.yaml, I would not have known how to do this. A typical case of: read the source Luke

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • 2
    This works great. The typical advice in this case can now be upgraded to _read stackoverflow Luke_ which really is more welcome advice. Thanks for this fix! – Travis Wilson May 26 '15 at 15:53
0

My first impression of PyYaml was that it was attempting to maintain some level of consistent interface/behavior as JSON (dumps/loads).

I learned and appreciated the JSON functionality because it was easy for me to read JSON into a dynamically constructed type. Yet I had issues with the JSON format itself, particularly the lack of support multi-line strings, comments and readability.

Using PyYAML I found it surprisingly difficult to de-serialize yaml to a type. There seems to be many hoops to jump through of which I dont have the time/interest to learn. Consider the following code which de-serializes JSON to a type:

with open(file) as filereader: json.load(filereader, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))

via use of an object loading hook i can convert the dictionary into a namedtuple. Now pyyaml is very good at converting yaml to dictionaries. I ended up applying this hack where I flow from yamlfile -> dictionary -> json string -> object like the following:

json.loads(json.dumps(yaml.load(filereader)), object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))

This single line reads yaml file into typed object by means of intermediary json translation. In my case it is a worthwhile hack because the alternatives are significantly more complex.

nachonachoman
  • 802
  • 1
  • 13
  • 29