8

Is there any way to cause yaml.load to raise an exception whenever a given key appears more than once in the same dictionary?

For example, parsing the following YAML would raise an exception, because some_key appears twice:

{
  some_key: 0,
  another_key: 1,
  some_key: 1
}

Actually, the behavior described above corresponds to the simplest policy regarding key redefinitions. A somewhat more elaborate policy could, for example, could specify that only redefinitions that change the value assigned to the key would result in an exception, or could allow setting the level of severity of key-redefinition to "warning" rather than "error". Etc. An ideal answer to this question would be capable of supporting such variants.

kjo
  • 33,683
  • 52
  • 148
  • 265
  • 2
    Huh, it appears that PyYAML should already do this (duplicate keys are disallowed by the YAML spec), and that it doesn't is actually [a bug that's been open for the last seven years](http://pyyaml.org/ticket/128). – Wander Nauta Dec 18 '15 at 15:42
  • 1
    That ticket has been migrated [here](https://github.com/yaml/pyyaml/issues/41). Still open. sadpanda.jpg – wim Oct 26 '17 at 01:30

2 Answers2

2

If you want the loader to throw an error, then you should just define your own loader, with a constructor that checks if the key is already in the mapping ¹:

import collections
import ruamel.yaml as yaml

from ruamel.yaml.reader import Reader
from ruamel.yaml.scanner import Scanner
from ruamel.yaml.parser_ import Parser
from ruamel.yaml.composer import Composer
from ruamel.yaml.constructor import Constructor
from ruamel.yaml.resolver import Resolver
from ruamel.yaml.nodes import MappingNode
from ruamel.yaml.compat import PY2, PY3


class MyConstructor(Constructor):
    def construct_mapping(self, node, deep=False):
        if not isinstance(node, MappingNode):
            raise ConstructorError(
                None, None,
                "expected a mapping node, but found %s" % node.id,
                node.start_mark)
        mapping = {}
        for key_node, value_node in node.value:
            # keys can be list -> deep
            key = self.construct_object(key_node, deep=True)
            # lists are not hashable, but tuples are
            if not isinstance(key, collections.Hashable):
                if isinstance(key, list):
                    key = tuple(key)
            if PY2:
                try:
                    hash(key)
                except TypeError as exc:
                    raise ConstructorError(
                        "while constructing a mapping", node.start_mark,
                        "found unacceptable key (%s)" %
                        exc, key_node.start_mark)
            else:
                if not isinstance(key, collections.Hashable):
                    raise ConstructorError(
                        "while constructing a mapping", node.start_mark,
                        "found unhashable key", key_node.start_mark)

            value = self.construct_object(value_node, deep=deep)
            # next two lines differ from original
            if key in mapping:
                raise KeyError
            mapping[key] = value
        return mapping


class MyLoader(Reader, Scanner, Parser, Composer, MyConstructor, Resolver):
    def __init__(self, stream):
        Reader.__init__(self, stream)
        Scanner.__init__(self)
        Parser.__init__(self)
        Composer.__init__(self)
        MyConstructor.__init__(self)
        Resolver.__init__(self)



yaml_str = """\
some_key: 0,
another_key: 1,
some_key: 1
"""

data = yaml.load(yaml_str, Loader=MyLoader)
print(data)

and that throws a KeyError.

Please note that the curly braces you use in your example are unnecessary.

I am not sure if this will work with merge keys.


¹ This was done using ruamel.yaml of which I am the author. ruamel.yaml an enhanced version of PyYAML, and the loader code for the latter should be similar.

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • 1
    Beautiful stuff. I'm glad to see that YAML is not being forgotten. YAML takes an already great idea (JSON), and makes it 10x better. I'm using YAML for config files, and I find that it takes care of every conceivable need in this problem domain. Thank you for this answer, and most of all, thank you for `ruamel.yaml`. – kjo Dec 19 '15 at 14:48
  • Also, thanks for the clue on the superfluous curly brackets. – kjo Dec 19 '15 at 14:50
  • This code is outdated on current version (0.15.34). `ModuleNotFoundError: No module named 'ruamel.yaml.parser_'`, fix that then `TypeError: __init__() got an unexpected keyword argument 'preserve_quotes'` – wim Oct 26 '17 at 01:35
  • Just using `yaml.load(yaml_str, Loader=ruamel.yaml.Loader)` seems to be working. – wim Oct 26 '17 at 01:38
  • @wim There is a **documented** break in the code with 0.15. If you use the latest 0.14 version you should be able to use the above code. – Anthon Oct 26 '17 at 15:22
2

Here's the equivalent code from Anthon's answer if you're using pyyaml:

import collections
import yaml
import sys

from yaml.reader import Reader
from yaml.scanner import Scanner
from yaml.parser import Parser
from yaml.composer import Composer
from yaml.constructor import Constructor, ConstructorError
from yaml.resolver import Resolver
from yaml.nodes import MappingNode


class NoDuplicateConstructor(Constructor):
    def construct_mapping(self, node, deep=False):
        if not isinstance(node, MappingNode):
            raise ConstructorError(
                None, None,
                "expected a mapping node, but found %s" % node.id,
                node.start_mark)
        mapping = {}
        for key_node, value_node in node.value:
            # keys can be list -> deep
            key = self.construct_object(key_node, deep=True)
            # lists are not hashable, but tuples are
            if not isinstance(key, collections.Hashable):
                if isinstance(key, list):
                    key = tuple(key)

            if sys.version_info.major == 2:
                try:
                    hash(key)
                except TypeError as exc:
                    raise ConstructorError(
                        "while constructing a mapping", node.start_mark,
                        "found unacceptable key (%s)" %
                        exc, key_node.start_mark)
            else:
                if not isinstance(key, collections.Hashable):
                    raise ConstructorError(
                        "while constructing a mapping", node.start_mark,
                        "found unhashable key", key_node.start_mark)

            value = self.construct_object(value_node, deep=deep)

            # Actually do the check.
            if key in mapping:
                raise KeyError("Got duplicate key: {!r}".format(key))

            mapping[key] = value
        return mapping


class NoDuplicateLoader(Reader, Scanner, Parser, Composer, NoDuplicateConstructor, Resolver):
    def __init__(self, stream):
        Reader.__init__(self, stream)
        Scanner.__init__(self)
        Parser.__init__(self)
        Composer.__init__(self)
        NoDuplicateConstructor.__init__(self)
        Resolver.__init__(self)



yaml_str = """\
some_key: 0,
another_key:
  x: 1
"""

data = yaml.load(yaml_str, Loader=NoDuplicateLoader)
print(data)
Wilfred Hughes
  • 29,846
  • 15
  • 139
  • 192