-1

Consider the following file:

- k0: v0
  vars: &splat0
    VAR0: potato  # vars from blob0

- k1: v1
  vars: &splat1
    VAR1: spud    # vars from blob1

- k: v
  extra:          # merged vars from blob0 + blob1
    <<: *splat0
    <<: *splat1

It makes use of the merge key features of YAML.

Is this a valid YAML file? The spec (1.1, 1.2) says that within a mapping node there is the "restriction that each of the keys is unique", however it is not clear whether the merge key themselves are subject to uniqueness constraint, or whether only the mapping keys after a resolved merge need to be unique.

PyYAML parses this and merges keys, but the comments are lost. ruamel is able to preserve comments but raises a DuplicateKeyError, and if you explicitly allow duplicate keys then it parses but the merge is lost.

Is this input valid YAML and how should it be correctly parsed in Python?

wim
  • 338,267
  • 99
  • 616
  • 750
  • 1
    "PyYAML parses this and merges keys, but the comments are lost" so it's valid? I'm not sure what you're trying to do, but I'm not sure the question encapsulates it – roganjosh Feb 13 '20 at 20:13
  • Ideally I want to preserve comments and not muck up the merges (I am not in control of the input). But that's not what this question is about, this question is much smaller in scope: **are duplicate merge keys actually allowed in the YAML spec**. – wim Feb 13 '20 at 20:39
  • Does this answer your question? [Configuring ruamel.yaml to allow duplicate keys](https://stackoverflow.com/questions/55540686/configuring-ruamel-yaml-to-allow-duplicate-keys) – sanyassh Feb 13 '20 at 21:24
  • Answer in proposed duplicate does the merge correctly and preserves comments. – sanyassh Feb 13 '20 at 21:24
  • Also checked with http://www.yamllint.com/, https://yaml-online-parser.appspot.com/ and some others, all say that it is valid yaml. – sanyassh Feb 13 '20 at 21:25
  • @sanyash Those sites also claims that YAML with some other duplicate mapping keys is "valid", which is incorrect. The site is wrong. The proposed duplicate does not address the question at all. – wim Feb 13 '20 at 21:27
  • Okay, you can be right about sites being wrong, but are you sure the proposed duplicate does not address your issue? Did you look through the body, not only at the title? `In this case the duplicate key happens to be a merge key <<:.` seems very similar to your case. – sanyassh Feb 13 '20 at 21:31
  • 1
    I am sure. This question is about whether the *YAML spec allows duplicate merge keys*. The answer on that question doesn't address this (and they're actually the author of ruamel, so it would be a conflict of interest regardless). But the post is interesting anyway, so thanks for the link. – wim Feb 13 '20 at 21:33

1 Answers1

0

Merge keys are just like any other key, they only are interpreted in a specially defined way, when a YAML parser implements the merge key extension (which is doesn't have to). In my opinion this is therefor invalid YAML.

But there is another argument against this even if the merge key would be so special that it doesn't follow the normal key restrictions. Let's assume your input file would look like:

- k0: v0
  vars: &splat0
    VAR0: potato  # vars from blob0
    VAR2: tater

- k1: v1
  vars: &splat1
    VAR1: spud    # vars from blob1
    VAR2: tuber

- k: v
  extra:          # merged vars from blob0 + blob1
    <<: *splat0
    <<: *splat1

And also assume you could load this incorrect YAML into data. What would be the value of data[2]['extra']['VAR2'] ?

Since the YAML specification explicitly indicates:

In particular, mapping key order, comments, and tag handles should not be referenced during composition.

So unless you break with another explicit restriction (the key ordering) you cannot correctly parse this (which is what PyYAML does. IMHO a bug).

This means that when you correctly implement the YAML specification, you cannot decide if the values of data[2]['extra'] are first updated with a VAR2: tuber or with VAR2: tater. That is why ruamel.yaml doesn't allow this.

It is of course well defined in the merge key spec which value for key VAR2 you get when you do:

      <<: [*splat0, *splat1]
Anthon
  • 69,918
  • 32
  • 186
  • 246
  • Hi Anthon, the alternate example is not interesting to me because after resolving merges the key VAR2 is non-unique. So, it should not successfully parse, regardless. We're in agreement that PyYAML is bugged here, and they have an open issue since forever about it... – wim Feb 13 '20 at 22:03
  • What does make you think that `VAR2` has to be unique. The merge specification explicitly mentions `Keys in mapping nodes earlier in the sequence override keys specified in later mapping nodes.` (that IIRC PyYAML does wrong for the valid `[*splat0, *splat1]` case as well). – Anthon Feb 13 '20 at 22:08
  • Because it makes more sense for the mapping keys to be resolved *before* checking for dupes, otherwise you could not detect conflicts on equal keys such as `0xf` and `15`. – wim Feb 13 '20 at 22:15
  • Looks like PyYAML merges `[ *BIG, *LEFT, *SMALL ]` correctly for [the example](https://yaml.org/type/merge.html) i.e. the final map ended up with `'r': 10`. Do you have some other failure mode? – wim Feb 13 '20 at 22:20
  • Might be that I didn't recall that correctly, or that it has been fixed. Anyway that has a non-unique key `r`, was your first comment indicating that if double merge keys would be allowed, then all the keys of all the (aliased) mappings have to be unique? That would be an extension to the merge key semantics. – Anthon Feb 13 '20 at 22:26
  • 1
    I don't know, but if you're asking for my _opinion_ I think that YAML is already far too complex and I don't like the merge syntax at all. btw it also seems to be a ruamel.yaml bug detecting dupe keys e.g. the last example here: http://dpaste.com/1MGYMFJ – wim Feb 13 '20 at 22:35