How to Parse YAML Using PyYAML if there are '!' within the YAML

Question

I have a YAML file that I'd like to parse the description variable only; however, I know that the exclamation points in my CloudFormation template (YAML file) are giving PyYAML trouble.

I am receiving the following error:

yaml.constructor.ConstructorError: could not determine a constructor for the tag '!Equals'

The file has many !Ref and !Equals. How can I ignore these constructors and get a specific variable I'm looking for -- in this case, the description variable.

Possible duplicate of [Parse an AWS CloudFormation template with a YAML library](https://stackoverflow.com/questions/50914422/parse-an-aws-cloudformation-template-with-a-yaml-library) — Alex Harvey, Mar 26 '19 at 11:53

Anthon · Answer 1 · 2018-09-09T10:40:43.570

If you have to deal with a YAML document with multiple different tags, and are only interested in a subset of them, you should still handle them all. If the elements you are intersted in are nested within other tagged constructs you at least need to handle all of the "enclosing" tags properly.

There is however no need to handle all of the tags individually, you can write a constructor routine that can handle mappings, sequences and scalars register that to PyYAML's SafeLoader using:

import yaml

inp = """\
MyEIP:
  Type: !Join [ "::", [AWS, EC2, EIP] ]
  Properties:
    InstanceId: !Ref MyEC2Instance
"""

description = []

def any_constructor(loader, tag_suffix, node):
    if isinstance(node, yaml.MappingNode):
        return loader.construct_mapping(node)
    if isinstance(node, yaml.SequenceNode):
        return loader.construct_sequence(node)
    return loader.construct_scalar(node)

yaml.add_multi_constructor('', any_constructor, Loader=yaml.SafeLoader)

data = yaml.safe_load(inp)
print(data)

which gives:

{'MyEIP': {'Type': ['::', ['AWS', 'EC2', 'EIP']], 'Properties': {'InstanceId': 'MyEC2Instance'}}}

(inp can also be a file opened for reading).

As you see above will also continue to work if an unexpected !Join tag shows up in your code, as well as any other tag like !Equal. The tags are just dropped.

Since there are no variables in YAML, it is a bit of guesswork what you mean by "like to parse the description variable only". If that has an explicit tag (e.g. !Description), you can filter out the values by adding 2-3 lines to the any_constructor, by matching the tag_suffix parameter.

    if tag_suffix == u'!Description':
        description.append(loader.construct_scalar(node))

It is however more likely that there is some key in a mapping that is a scalar description, and that you are interested in the value associated with that key.

    if isinstance(node, yaml.MappingNode):
        d = loader.construct_mapping(node)
        for k in d:
        if k == 'description':
            description.append(d[k])
        return d

If you know the exact position in the data hierarchy, You can of course also walk the data structure and extract anything you need based on keys or list positions. Especially in that case you'd be better of using my ruamel.yaml, was this can load tagged YAML in round-trip mode without extra effort (assuming the above inp):

from ruamel.yaml import YAML

with YAML() as yaml:
    data = yaml.load(inp)

I like the `any_constructor`. – Eddy Pronk Sep 09 '18 at 06:34 — Eddy Pronk, Sep 09 '18 at 06:34

Eddy Pronk · Accepted Answer · 2018-09-09T08:26:45.577

3

You can define a custom constructors using a custom yaml.SafeLoader

import yaml

doc = '''
Conditions: 
  CreateNewSecurityGroup: !Equals [!Ref ExistingSecurityGroup, NONE]
'''

class Equals(object):
    def __init__(self, data):
        self.data = data
    def __repr__(self):
        return "Equals(%s)" % self.data

class Ref(object):
    def __init__(self, data):
        self.data = data
    def __repr__(self):
        return "Ref(%s)" % self.data

def create_equals(loader,node):
    value = loader.construct_sequence(node)
    return Equals(value)

def create_ref(loader,node):
    value = loader.construct_scalar(node)
    return Ref(value)

class Loader(yaml.SafeLoader):
    pass

yaml.add_constructor(u'!Equals', create_equals, Loader)
yaml.add_constructor(u'!Ref', create_ref, Loader)
a = yaml.load(doc, Loader)
print(a)

Outputs:

{'Conditions': {'CreateNewSecurityGroup': Equals([Ref(ExistingSecurityGroup), 'NONE'])}}

edited Sep 09 '18 at 08:26

answered Sep 09 '18 at 05:56

Eddy Pronk

6,527
5
33
57

1

Better, but there is no need to make an extra class. Especially since that obfuscates the fact that registering these constructors still change **all** future YAML loading by the program (which is a PyYAML deficiency). And this still cannot handle `!Split` or any other CloudFormation construct that might show up. – Anthon Sep 09 '18 at 06:07
@Anthon All future YAML loading? I assumed `yaml.add_contructor` would only change it in the scope of my `Loader` class. That's a serious flaw. – Eddy Pronk Sep 09 '18 at 06:16
It is has been a while since I looked at that, but IIRC all the add_constructor calls add to the **class** variables `yaml_constructor` on `BaseConstructor`. That can indeed be a serious problem if you deal with multiple, different YAML documents to parse. But it is to be expected if you cannot pass in an instance of a class Loader, but have to pass in the class (or subclass) itself. That is the main reason why `ruamel.yaml`'s new API has the `yaml = YAML()` instantiation construct: to be able to move away from this (and at some point I need to break backwards compatibility because of that). – Anthon Sep 09 '18 at 06:28
@Anton When I add a call `yaml.load(doc)` I get the error "could not determine a constructor for the tag '!Equals'". So, I think it is scoped. (PyYAML 3.12) – Eddy Pronk Sep 09 '18 at 07:22
It is sort of scoped. With your code as in your answer, first try changing the `Loader` parameter of the first `add_constructor` call to `yaml.SafeLoader`. Then revent and the change the `Loader` parameter of the second `add_constructor` call. (BTW you should start using the `print` function instead of the `print`statement,) – Anthon Sep 09 '18 at 08:23
@Anthon I'm probably doing something wrong, but I can't reproduce a scoping issue here. Feel free to use my example code for a bug report: https://github.com/yaml/pyyaml/ or post a gist here. – Eddy Pronk Sep 09 '18 at 08:38
@AzatIbrakov There is some value in the comments. I'll delete the first one soon. – Eddy Pronk Sep 09 '18 at 08:42
@EddyPronk http://www.ruamel.eu/dl/static/ab51cdda-92a7-4e3e-a0bb-1e2abb98cbc1/eddy_pronk_00.html – Anthon Sep 09 '18 at 09:08
@Anthon That's weird behaviour. If `add_constructor` is patching the loader classes then it could be explained. Then it is patching `Loader` and it base-class. When I create a class `Loader2` I can't reproduce it. – Eddy Pronk Sep 09 '18 at 09:22
It does, it is the delayed copying of the [parent class' constructors](https://github.com/yaml/pyyaml/blob/master/lib3/yaml/constructor.py#L145). Essentially you should always do all of the add_constructing on `BaseLoader` subclass, directly after defining it, not that that is always practical if you gather and register classes from other modules – Anthon Sep 09 '18 at 09:33
@Anthon Maybe we are saying the same thing, but I meant that if you stay away from using `yaml.SafeLoader` directly the scoping seems to work fine. https://gist.github.com/epronk/94b07803745e908ea6e2b81964bc379a – Eddy Pronk Sep 09 '18 at 09:42

How to Parse YAML Using PyYAML if there are '!' within the YAML

2 Answers2