0

(Side note: This is a follow-up on this https://sourceforge.net/p/ruamel-yaml/tickets/313/)

I'm building a GitLab CI pipeline by defining a .gitlab-ci.yml file, see https://docs.gitlab.com/ee/ci/yaml/.

As my CI consists of several very similar build steps, I'm using YAML-Anchors quite heavily. For example to define common cache and before-scripts.

I saw that "the correct way" of merging several yaml-anchors, due to the spec, is using

befor-script: &before-script
...

cache: &cache
...

ci-step:
    image: ABC
    <<: [*before-script, *cache]
    script: ...
   

However, using this also works fine with GitLab CI and IMHO is much nicer to read:

...

ci-step:
    image: abc
    <<: *before-script
    script: ...
    <<: *cache

This also enables to put different merge keys at different positions.

All is fine so far, because it is working in GitLab CI.

Now we are using https://github.com/pre-commit/pre-commit-hooks to validate YAML-files in our repository. pre-commit-hooks is using ruamel-yaml internally for yaml-validation.

As a result, the pre-commit-hook fails with the following error message

while construction a mapping 
   in ".gitlab-ci.yml", line xx, column y
found duplicate key "<<"
   in ".gitlab-ci.yml", line zz, column y

How can I prevent this exception from happeing if the key is equal to << in the ruamel-yaml library.

It would also be possible to update pre-commit-hooks to set allow_duplicate_keys = True, see yaml-duplicate-keys. But this would also allow other duplicate keys, which is not perfect.

Anthon
  • 69,918
  • 32
  • 186
  • 246
RamNow
  • 456
  • 5
  • 16

1 Answers1

2

The normal way to prevent duplicate keys from throwing an error, is by setting .allow_duplicate_keys as you indicated. If you set that, any values for duplicate keys 'later' in the mapping overwrite previous values. In PyYAML, from which ruamel.yaml was derived, this is the side effect of a bug in PyYAML.

However duplicating << is IMO more problematic, as

<<: *a
<<: *b

is undefined and might be expected to work as if YAML document contained:

<<: [*a, *b]

or contained:

<<: [*b, *a]

or only:

<<: *b

or:

<<: *a

And depending on what key-value pairs a and b refer to, these have all different outcomes for the mapping in which the merge is applied.

To prevent the error from being thrown on merge keys only, you need to adapt the loader, but make sure you don't try to use or dump the result, garbage in means garbage out.

import sys
import ruamel.yaml

yaml_str = """\
before-script: &before-script
  x: 1

cache: &cache
  y: 2

ci-step:
  image: ABC
  <<: *before-script
  script: DEF
  <<: *cache
"""

class MyConstructor(ruamel.yaml.SafeConstructor):
    def flatten_mapping(self, node):
        index = 0
        while index < len(node.value):
            key_node, value_node = node.value[index]
            if key_node.tag == 'tag:yaml.org,2002:merge':
                del node.value[index]
            index += 1

yaml = ruamel.yaml.YAML(typ='safe')
yaml.Constructor = MyConstructor

data = yaml.load(yaml_str)
print(list(data['ci-step'].keys()))

which gives:

['image', 'script']

You should complain to Gitlab that it allows invalid YAML, especially bad because it has no defined loading behaviour. And if they insist on continuing to support that kind of invalid YAML, they should tell you what it means for the mapping in which this happens.

Greg Dubicki
  • 5,983
  • 3
  • 55
  • 68
Anthon
  • 69,918
  • 32
  • 186
  • 246