15

The yaml library in python is not able to detect duplicated keys. This is a bug that has been reported years ago and there is not a fix yet.

I would like to find a decent workaround to this problem. How plausible could be to create a regex that returns all the keys ? Then it would be quite easy to detect this problem.

Could any regex master suggest a regex that is able to extract all the keys to find duplicates ?

File example:

mykey1:
    subkey1: value1
    subkey2: value2
    subkey3:
      - value 3.1
      - value 3.2
mykey2:
    subkey1: this is not duplicated
    subkey5: value5
    subkey5: duplicated!
    subkey6:
       subkey6.1: value6.1
       subkey6.2: valye6.2
Adrien Vergé
  • 381
  • 3
  • 7
Tk421
  • 6,196
  • 6
  • 38
  • 47
  • Their implementation sucks! I agree with you, they should have added the option in the constructor. Did you find a way to verify documents programmatically? – Marcello DeSales Jun 05 '17 at 20:40
  • There are generally a couple of approaches to missing library features: either fix it or get a different library. Fixing it can range from a local patch to a full blown PR. If the library is generally useful and good, PR is the way to go – Mad Physicist Aug 05 '23 at 15:44

3 Answers3

10

Over-riding on of the build in loaders is a more lightweight approach:

 import yaml
 # special loader with duplicate key checking
 class UniqueKeyLoader(yaml.SafeLoader):
     def construct_mapping(self, node, deep=False):
         mapping = []
         for key_node, value_node in node.value:
             key = self.construct_object(key_node, deep=deep)
             assert key not in mapping
             mapping.append(key)
         return super().construct_mapping(node, deep)

then:

 yaml_text = open(filename), 'r').read()
 data[f] = yaml.load(yaml_text, Loader=UniqueKeyLoader)                  
ErichBSchulz
  • 15,047
  • 5
  • 57
  • 61
  • This is elegant and straightforward, unlike many recommendations that require importing another package. – Dave Liu May 11 '21 at 00:09
9

The yamllint command-line tool does what you want:

sudo pip install yamllint

Specifically, it has a rule key-duplicates that detects repetitions and keys over-writing one another:

$ yamllint test.yaml
test.yaml
  1:1       warning  missing document start "---"  (document-start)
  10:5      error    duplication of key "subkey5" in mapping  (key-duplicates)

(It has many other rules that you can enable/disable or tweak.)

Adrien Vergé
  • 381
  • 3
  • 7
1

ErichBSchulz great effort. thanks for the fixed code. here I did minor changes. Updating the file name with line and column.

class UniqueKeyLoader(yaml.SafeLoader):
    def construct_mapping(self, node, deep=False):
        mapping = set()
        for key_node, value_node in node.value:
            each_key = self.construct_object(key_node, deep=deep)
            if each_key in mapping:
                raise ValueError(f"Duplicate Key: {each_key!r} is found in YAML File.\n"
                                 f"Error File location: {key_node.end_mark}")
            mapping.add(each_key)
        return super().construct_mapping(node, deep)

with open(test_suite_full_path, 'r') as f:
    yaml_ret_dict = yaml.load(f, Loader=UniqueKeyLoader)
markalex
  • 8,623
  • 2
  • 7
  • 32
muthukumar
  • 71
  • 1
  • 4
  • @LeiMao, good catch. But don't add meta information (like edited on 28.02.2024) in posts, unless there is a really good reason for it. Please, flag this comment as no longer needed, once you see it. – markalex Aug 05 '23 at 13:48