0

I'm writing a YAML file using the yaml library in Python 3, and I'd like to choose where it puts the line breaks when writing a long block of text.

Here's a silly example of the kind of thing I'm trying to do. The days entry is a long block of text with several items separated by commas. I'd like to keep each item together on a line, but in this example, "9 Ladies Dancing" gets split.

from yaml import safe_load, safe_dump

s = """\
- title: 12 Days of Christmas
- days: A partridge in a pear tree,
    2 Turtle Doves,
    3 French Hens,
    4 Calling Birds,
    5 Gold Rings,
    6 Geese a-Laying,
    7 Swans a-Swimming,
    8 Maids a-Milking,
    9 Ladies Dancing,
    10 Lords a-Leaping,
    11 Pipers Piping,
    12 Drummers Drumming
"""
l = safe_load(s)

print(safe_dump(l, default_flow_style=False))

This prints out:

- title: 12 Days of Christmas
- days: A partridge in a pear tree, 2 Turtle Doves, 3 French Hens, 4 Calling Birds,
    5 Gold Rings, 6 Geese a-Laying, 7 Swans a-Swimming, 8 Maids a-Milking, 9 Ladies
    Dancing, 10 Lords a-Leaping, 11 Pipers Piping, 12 Drummers Drumming

I'd like to load the full text of days in as a single line, but I want to print it out as several lines up to 80 characters wide to make the items easier to review for correctness. I want several items on a line, but I'd like to split the lines at a comma so items don't get split across lines.

Don Kirkby
  • 53,582
  • 27
  • 205
  • 286

2 Answers2

1

One simple solution is to set the width to something huge, but I don't want to do that.

Hopefully, there's a feature of the yaml library that I haven't found yet, but this is the best I've come up with:

from yaml import safe_load, safe_dump, SafeDumper, dump

s = """\
- title: 12 Days of Christmas
- days: A partridge in a pear tree,
    2 Turtle Doves,
    3 French Hens,
    4 Calling Birds,
    5 Gold Rings,
    6 Geese a-Laying,
    7 Swans a-Swimming,
    8 Maids a-Milking,
    9 Ladies Dancing,
    10 Lords a-Leaping,
    11 Pipers Piping,
    12 Drummers Drumming
"""
l = safe_load(s)

print(safe_dump(l, default_flow_style=False))


class SplitDumper(SafeDumper):
    def write_plain(self, text, split=True):
        delimiter = ','
        if split:
            pieces = text.split(delimiter)
        else:
            pieces = [text]
        buffer = ''
        for i, piece in enumerate(pieces):
            if i > 0:
                buffer += delimiter
            if self.column-1 + len(buffer) + len(piece) <= self.best_width:
                buffer += piece
            else:
                super(SplitDumper, self).write_plain(buffer, split)
                self.write_indent()
                buffer = piece
        super(SplitDumper, self).write_plain(buffer)

print(dump(l, default_flow_style=False, Dumper=SplitDumper))

The SplitDumper class overrides the write_plain() method to split it into chunks at the commas, and then detects how many chunks it can write on each line.

That prints out the default splitting, followed by the custom splitting:

- title: 12 Days of Christmas
- days: A partridge in a pear tree, 2 Turtle Doves, 3 French Hens, 4 Calling Birds,
    5 Gold Rings, 6 Geese a-Laying, 7 Swans a-Swimming, 8 Maids a-Milking, 9 Ladies
    Dancing, 10 Lords a-Leaping, 11 Pipers Piping, 12 Drummers Drumming

- title: 12 Days of Christmas
- days: A partridge in a pear tree, 2 Turtle Doves, 3 French Hens, 4 Calling Birds,
     5 Gold Rings, 6 Geese a-Laying, 7 Swans a-Swimming, 8 Maids a-Milking,
     9 Ladies Dancing, 10 Lords a-Leaping, 11 Pipers Piping, 12 Drummers Drumming
Don Kirkby
  • 53,582
  • 27
  • 205
  • 286
1

The string is already a single line after it's loaded:

>>> l
[{'title': '12 Days of Christmas'}, {'days': 'A partridge in a pear tree, 2 Turtle Doves, 3 French Hens, 4 Calling Birds, 5 Gold Rings, 6 Geese a-Laying, 7 Swans a-Swimming, 8 Maids a-Milking, 9 Ladies Dancing, 10 Lords a-Leaping, 11 Pipers Piping, 12 Drummers Drumming'}]

Have you considered using a block? See Preserve new lines in YAML

>>> s = """\
... - title: 12 Days of Christmas
... - days: |-
...     A partridge in a pear tree,
...     2 Turtle Doves,
...     3 French Hens,
...     4 Calling Birds,
...     5 Gold Rings,
...     6 Geese a-Laying,
...     7 Swans a-Swimming,
...     8 Maids a-Milking,
...     9 Ladies Dancing,
...     10 Lords a-Leaping,
...     11 Pipers Piping,
...     12 Drummers Drumming
... """
>>> safe_load(s)
[{'title': '12 Days of Christmas'}, {'days': 'A partridge in a pear tree,\n2 Turtle Doves,\n3 French Hens,\n4 Calling Birds,\n5 Gold Rings,\n6 Geese a-Laying,\n7 Swans a-Swimming,\n8 Maids a-Milking,\n9 Ladies Dancing,\n10 Lords a-Leaping,\n11 Pipers Piping,\n12 Drummers Drumming'}]
Peter Gibson
  • 19,086
  • 7
  • 60
  • 64
  • Yes, I want the text to be loaded into memory as a single line. I just want it to be written into a YAML file as multiple lines to make it easier to review for correctness. By splitting on the commas, each item to review will be kept together. I'll look at using a block, and see if that will work for me. Thanks for the suggestion. – Don Kirkby Jan 10 '18 at 07:26