0

I'm currently a little bit confused by PyYAML. I installed version 3.12 on my Windows and my Linux system and have seen, that it behaves differently in sorting the values.

Lets have a look at this example YAML-file:

functions:
    function_a:
        value_1: 1
        value_2: 1
        value_3: 1
    function_c:
        value_1: 1
        value_2: 1
        value_3: 1
    function_d:
        value_1: 1
        value_2: 1
        value_4: 1
    function_b:
        value_1: 1
        value_2: 1
        value_3: 1

Loading the YAML file is done like usual via conf = yaml.load(fp).

Now, what's really strange between those two systems is, that when I try to go through all functions, I get a different order on both OS systems.

On Windows it would be:

import yaml
with open('myyamlfile.yml') as fp:
    conf = yaml.load(fp)
for function in conf['functions']:
    print(function)

function_a
function_c
function_d
function_b

On Linux it comes in an ordered way:

import yaml
with open('myyamlfile.yml') as fp:
    conf = yaml.load(fp)
for function in conf['functions']:
    print(function)

function_a
function_b
function_c
function_d

And I really do not have any clue why. I'm using the same code on both machines with the same module version. The only difference between both machines is the OS and the fact, that on Windows I'm using 3.6.5 and on Linux I'm using 3.4.8.

Has anyone a hint for me why this happens?

brillenheini
  • 793
  • 7
  • 22

1 Answers1

1

First of all the YAML specification (both the old 1.1 that PyYAML is based on as well as the newer 1.2 specification (2009)) indicate that keys of mappings are unordered. So you should not rely on an order to be there after loading.

Then under the hood, there is of course the difference that Python 3.6 dicts are ordered (in the CPython implemenatation, in the other implementations this starts with 3.7), whereas pre Python 3.6 dicts are not ordered. PyYAML creates a dict and fills it in the order that keys are read from the YAML document, so the 3.6.5 version gets the order of the key insertion, 3.4.8 does not.


If you need the behavior to be the same for both versions, I suggest you sort the keys explicitly:

for function in sort(conf['functions']):

if you really need to get the keys in the order they are in the YAML document, I suggest you take a look at ruamel.yaml (disclaimer: I am the author of that YAML 1.2 compatible package) and do. e.g.:

import pathlib
import ruamel.yaml

yaml = ruamel.yaml.YAML()
file_name = pathlib.Path('myfile.yaml')
conf = yaml.load(file_name)
for function in conf['functions']:
    print(function)

which will get you the output in Python 2.7 through 3.7 as you have in Python 3.6.5. ( in ruamel.yaml doing yaml.load() is safe by default).

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • Hi, well, it's actually the other way around as you described. On my Windows machine with 3.6.5 the dict is unordered while on Linux with 3.4.8 the dict gets sorted without any specific reason. That's why I was wondering. What I'd like to achive is that the order in which the tags under `functions`are written should remain. But I solved my issue now a little bit different in adding a runorder and add the function to a dictionary with the runorder value as key, which then keeps my desired order. – brillenheini May 30 '18 at 08:28
  • I think there is some confusion about things being ordered versus (lexically) sorted. The ordering you have in 3.6+ is what I called insertion ordering in the documentation for ruamel.ordereddict, not sorted ordering. So the first key put in the dict (and not removed) is what you get as first key when iterating, etc. If the order in the YAML file is what you want then ruamel.yaml would solve your problem. Having an explicit list of keys works as well. Alternatively make a sequence of single key-value items such as done by dumping an explicit OrderedDict – Anthon May 30 '18 at 11:06