37

I have the following YAML file named input.yaml:

cities:
  1: [0,0]
  2: [4,0]
  3: [0,4]
  4: [4,4]
  5: [2,2]
  6: [6,2]
highways:
  - [1,2]
  - [1,3]
  - [1,5]
  - [2,4]
  - [3,4]
  - [5,4]
start: 1
end: 4

I'm loading it using PyYAML and printing the result as follows:

import yaml

f = open("input.yaml", "r")
data = yaml.load(f)
f.close()

print(data)

The result is the following data structure:

{ 'cities': { 1: [0, 0]
            , 2: [4, 0]
            , 3: [0, 4]
            , 4: [4, 4]
            , 5: [2, 2]
            , 6: [6, 2]
            }
, 'highways': [ [1, 2]
              , [1, 3]
              , [1, 5]
              , [2, 4]
              , [3, 4]
              , [5, 4]
              ]
, 'start': 1
, 'end': 4
}

As you can see, each city and highway is represented as a list. However, I want them to be represented as a tuple. Hence, I manually convert them into tuples using comprehensions:

import yaml

f = open("input.yaml", "r")
data = yaml.load(f)
f.close()

data["cities"] = {k: tuple(v) for k, v in data["cities"].items()}
data["highways"] = [tuple(v) for v in data["highways"]]

print(data)

However, this seems like a hack. Is there some way to instruct PyYAML to directly read them as tuples instead of lists?

Anthon
  • 69,918
  • 32
  • 186
  • 246
Aadit M Shah
  • 72,912
  • 30
  • 168
  • 299

5 Answers5

34

I wouldn't call what you've done hacky for what you are trying to do. Your alternative approach from my understanding is to make use of python-specific tags in your YAML file so it is represented appropriately when loading the yaml file. However, this requires you modifying your yaml file which, if huge, is probably going to be pretty irritating and not ideal.

Look at the PyYaml doc that further illustrates this. Ultimately you want to place a !!python/tuple in front of your structure that you want to represented as such. To take your sample data, it would like:

YAML FILE:

cities:
  1: !!python/tuple [0,0]
  2: !!python/tuple [4,0]
  3: !!python/tuple [0,4]
  4: !!python/tuple [4,4]
  5: !!python/tuple [2,2]
  6: !!python/tuple [6,2]
highways:
  - !!python/tuple [1,2]
  - !!python/tuple [1,3]
  - !!python/tuple [1,5]
  - !!python/tuple [2,4]
  - !!python/tuple [3,4]
  - !!python/tuple [5,4]
start: 1
end: 4

Sample code:

import yaml

with open('y.yaml') as f:
    d = yaml.load(f.read())

print(d)

Which will output:

{'cities': {1: (0, 0), 2: (4, 0), 3: (0, 4), 4: (4, 4), 5: (2, 2), 6: (6, 2)}, 'start': 1, 'end': 4, 'highways': [(1, 2), (1, 3), (1, 5), (2, 4), (3, 4), (5, 4)]}
idjaw
  • 25,487
  • 7
  • 64
  • 83
6

Depending on where your YAML input comes from your "hack" is a good solution, especially if you would use yaml.safe_load() instead of the unsafe yaml.load(). If only the "leaf" sequences in your YAML file need to be tuples you can do ¹:

import pprint
import ruamel.yaml
from ruamel.yaml.constructor import SafeConstructor


def construct_yaml_tuple(self, node):
    seq = self.construct_sequence(node)
    # only make "leaf sequences" into tuples, you can add dict 
    # and other types as necessary
    if seq and isinstance(seq[0], (list, tuple)):
        return seq
    return tuple(seq)

SafeConstructor.add_constructor(
    u'tag:yaml.org,2002:seq',
    construct_yaml_tuple)

with open('input.yaml') as fp:
    data = ruamel.yaml.safe_load(fp)
pprint.pprint(data, width=24)

which prints:

{'cities': {1: (0, 0),
            2: (4, 0),
            3: (0, 4),
            4: (4, 4),
            5: (2, 2),
            6: (6, 2)},
 'end': 4,
 'highways': [(1, 2),
              (1, 3),
              (1, 5),
              (2, 4),
              (3, 4),
              (5, 4)],
 'start': 1}

if you then need to process more material where sequence need to be "normal" lists again, use:

SafeConstructor.add_constructor(
    u'tag:yaml.org,2002:seq',
    SafeConstructor.construct_yaml_seq)

¹ This was done using ruamel.yaml a YAML 1.2 parser, of which I am the author. You should be able to do same with the older PyYAML if you only ever need to support YAML 1.1 and/or cannot upgrade for some reason

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • Unfortunately, I need `highways` to be a list of tuples instead of a tuple of tuples. Nevertheless, using `safe_load` instead of `load` is a great suggestion. Thanks. – Aadit M Shah Sep 18 '16 at 06:05
  • 1
    Oops, I missed that, I fixed it by making the tuple constructor inspecting the first sequence element and not converting it to tuple if that is a list. You can of course fine tune that (inspecting all elements, checking for dicts etc). – Anthon Sep 18 '16 at 06:20
4

I ran in the same problem as the question and I was not too satisfied by the two answers. While browsing around the pyyaml documentation I found really two interesting methods yaml.add_constructor and yaml.add_implicit_resolver.

The implicit resolver solves the problem of having to tag all entries with !!python/tuple by matching the strings with a regex. I also wanted to use the tuple syntax, so write tuple: (10,120) instead of writing a list tuple: [10,120] which then gets converted to a tuple, I personally found that very annoying. I also did not want to install an external library. Here is the code:

import yaml
import re

# this is to convert the string written as a tuple into a python tuple
def yml_tuple_constructor(loader, node): 
    # this little parse is really just for what I needed, feel free to change it!                                                                                            
    def parse_tup_el(el):                                                                                                            
        # try to convert into int or float else keep the string                                                                      
        if el.isdigit():                                                                                                             
            return int(el)                                                                                                           
        try:                                                                                                                         
            return float(el)                                                                                                         
        except ValueError:                                                                                                           
            return el                                                                                                                

    value = loader.construct_scalar(node)                                                                                            
    # remove the ( ) from the string                                                                                                 
    tup_elements = value[1:-1].split(',')                                                                                            
    # remove the last element if the tuple was written as (x,b,)                                                                     
    if tup_elements[-1] == '':                                                                                                       
        tup_elements.pop(-1)                                                                                                         
    tup = tuple(map(parse_tup_el, tup_elements))                                                                                     
    return tup                                                                                                                       

# !tuple is my own tag name, I think you could choose anything you want                                                                                                                                   
yaml.add_constructor(u'!tuple', yml_tuple_constructor)
# this is to spot the strings written as tuple in the yaml                                                                               
yaml.add_implicit_resolver(u'!tuple', re.compile(r"\(([^,\W]{,},){,}[^,\W]*\)")) 

Finally by executing this:

>>> yml = yaml.load("""
   ...: cities:
   ...:   1: (0,0)
   ...:   2: (4,0)
   ...:   3: (0,4)
   ...:   4: (4,4)
   ...:   5: (2,2)
   ...:   6: (6,2)
   ...: highways:
   ...:   - (1,2)
   ...:   - (1,3)
   ...:   - (1,5)
   ...:   - (2,4)
   ...:   - (3,4)
   ...:   - (5,4)
   ...: start: 1
   ...: end: 4""")
>>>  yml['cities']
{1: (0, 0), 2: (4, 0), 3: (0, 4), 4: (4, 4), 5: (2, 2), 6: (6, 2)}
>>> yml['highways']
[(1, 2), (1, 3), (1, 5), (2, 4), (3, 4), (5, 4)]

There could be a potential drawback with save_load compared to load which I did not test.

Olivier
  • 41
  • 1
  • 5
    This only works for the simplest of tuples as the OP used as an example. Using tuples that are tagged sequences allows you to nest tuples in tuples, use aliases (or define anchors) within those tuples. Your code cannot cope with that, and you even needs to change your code if you want to use a simple thing like a boolean. – Anthon Aug 14 '18 at 07:30
0

You treat a tuple as a list.

params.yaml

foo:
  bar: ["a", "b", "c"]

Source

DanielBell99
  • 896
  • 5
  • 25
  • 57
0

This worked for me -

config.yaml

cities:
    1: !!python/tuple [0,0]
    2: !!python/tuple [4,0]
    3: !!python/tuple [0,4]
    4: !!python/tuple [4,4]
    5: !!python/tuple [2,2]
    6: !!python/tuple [6,2]
highways:
    - !!python/tuple [1,2]
    - !!python/tuple [1,3]
    - !!python/tuple [1,5]
    - !!python/tuple [2,4]
    - !!python/tuple [3,4]
    - !!python/tuple [5,4]
start: 1
end: 4

main.py

import yaml

def tuple_constructor(loader, node):
    # Load the sequence of values from the YAML node
    values = loader.construct_sequence(node)
    # Return a tuple constructed from the sequence
    return tuple(values)

# Register the constructor with PyYAML
yaml.SafeLoader.add_constructor('tag:yaml.org,2002:python/tuple', 
tuple_constructor)

# Load the YAML file
with open('config.yaml', 'r') as f:
    data = yaml.load(f, Loader=yaml.SafeLoader)

print(data)

Output:

{'cities': {1: (0, 0), 2: (4, 0), 3: (0, 4), 4: (4, 4), 5: (2, 2), 6: (6, 2)},
'highways': [(1, 2), (1, 3), (1, 5), (2, 4), (3, 4), (5, 4)], 
'start': 1, 
'end': 4}
Anant
  • 396
  • 4
  • 11