3

I'm using YamlDotNet to parse simple configuration files (no deep nesting, etc.). The deserializer will parse strings containing duplicate fields, overwriting earlier values. For example,

foo: bar
foo: baz

is considered equivalent to

foo: baz

For my application, I would prefer that such duplicates cause an exception to be thrown. Is this possible?

MJD
  • 33
  • 5

3 Answers3

3

The UniqueKeysDictionary example didn't work for me and from the second example it wasn't clear how exactly to validate. But I found much simpler way to do it if any duplicates are disallowed and loading the file twice is acceptable:

    private T DeserializeAndValidate<T>(StreamReader reader)
    {
        var yaml = new YamlStream();
        yaml.Load(reader); // throws if duplicates are found

        reader.BaseStream.Seek(0, SeekOrigin.Begin);
        using (var reader2 = new StreamReader(reader.BaseStream))
        {
            var deserializer = new Deserializer();
            var data = deserializer.Deserialize<T>(reader2);
            return data;
        }
    }
1

The default node deserializers use the indexer to assign values. One way to achieve the desired behavior is to deserialize to a type that does not allow duplicate values, such as:

public class UniqueKeysDictionary<TKey, TValue>
    : Dictionary<TKey, TValue>
    , IDictionary<TKey, TValue>
{
    TValue IDictionary<TKey, TValue>.this[TKey key]
    {
        get { return base[key]; }
        set { base.Add(key, value); }
    }
}

A fully working example can be found here.

One significant issue with this solution is that it violates the contract of the indexer, whose behavior should be to overwrite the value.

Another approach would be to replace the implementation of GenericDictionaryNodeDeserializer with one that uses the Add() method instead of the indexer. This is the relevant portion of a different example that shows how to replace a node deserializer:

var deserializer = new Deserializer();

var objectDeserializer = deserializer.NodeDeserializers
    .Select((d, i) => new {
        Deserializer = d as ObjectNodeDeserializer,
        Index = i
    })
    .First(d => d.Deserializer != null);

deserializer.NodeDeserializers[objectDeserializer.Index] =
    new ValidatingNodeDeserializer(objectDeserializer.Deserializer);
Antoine Aubry
  • 12,203
  • 10
  • 45
  • 74
  • Thanks for the detailed response, and for authoring this library. I got the second approach to work. It was a little tricky, because my GenericUniqueDictionaryNodeDeserializer, which was GenericDictionaryNodeDeserializer with the one-line change you suggested, used some code from YamlDotNet.Serialization.Utilities.ReflectionUtility and YamlDotNet.ReflectionExtensions that was inaccessible due to protection level, so I had to copy that in too. – MJD Jun 15 '15 at 01:44
  • This did not work for me with .net core 5 and yamldotnet 9.1.4 - https://dotnetfiddle.net/hfBX5R – JJS Feb 02 '21 at 06:00
1

There is a solution involving a linter, but I'm not sure it will be relevant to you since it won't cause an exception to be thrown in YamlDotNet. I'll post it anyway since it can avoid replacing the implementation of GenericDictionaryNodeDeserializer.

It's the yamllint command-line tool:

sudo pip install yamllint

Specifically, it has a rule key-duplicates that detects duplicated keys:

$ cat test.yml
foo: bar
foo: baz

$ yamllint test.yml
test.yml
  2:1       error    duplication of key "foo" in mapping  (key-duplicates)
Adrien Vergé
  • 381
  • 3
  • 7