8

We use PyYAML to prep config files for different environments. But our YAML blocks lose integrity.

Give input.yml ...

pubkey: |
    -----BEGIN PUBLIC KEY-----
    MIGfMA0GCSq7OPxRrQEBAQUAA4GNADCBiQKBgQCvRVUKp6pr4qBEnE9lviuyfiNq
    QtG/OCyBDXL4Bh3FmUzfNI+Z4Bh3FmUx+z2n0FCv/4BpgHTDl8D95NPopWVo1RH2
    UfhyMd6dQ/x9T5m+y38JMzmSVAk+Fqu8ya18+yQVOEyEIx3Gxpsgegow33gcxfjK
    EsUgJHXcpw7OPxRrCQIDAQAB
    -----END PUBLIC KEY-----

... executing this program using python3 ...

import yaml

with open('input.yml', mode='r') as f:
    parsed = yaml.safe_load(f)

with open('output.yml', mode='w') as f:
    yaml.dump(parsed, f)

... produces this output.yml ...

pubkey: '-----BEGIN PUBLIC KEY-----

    MIGfMA0GCSq7OPxRrQEBAQUAA4GNADCBiQKBgQCvRVUKp6pr4qBEnE9lviuyfiNq

    QtG/OCyBDXL4Bh3FmUzfNI+Z4Bh3FmUx+z2n0FCv/4BpgHTDl8D95NPopWVo1RH2

    UfhyMd6dQ/x9T5m+y38JMzmSVAk+Fqu8ya18+yQVOEyEIx3Gxpsgegow33gcxfjK

    EsUgJHXcpw7OPxRrCQIDAQAB

    -----END PUBLIC KEY-----

    '

Is it possible to preserve the structure of my block using PyYAML?

Anthon
  • 69,918
  • 32
  • 186
  • 246
Chris Betti
  • 2,721
  • 2
  • 27
  • 36

1 Answers1

9

Yes that is possible with pyyaml, but you do have to provide your own enhanced versions of at least the Scanner, Parser and Constructor that are used by safe_load, the Emitter, Serializer and Representer used by dump, and by providing a specialized string-like class that keeps information about it's original formatting.

This is part of what was added to ruamel.yaml (disclaimer: I am the author of that package) as it was derived from PyYAML. Using ruamel.yaml the prefefred way of doing this is:

import sys
import ruamel.yaml

yaml_str = """\
pubkey: |
    -----BEGIN PUBLIC KEY-----
    MIGfMA0GCSq7OPxRrQEBAQUAA4GNADCBiQKBgQCvRVUKp6pr4qBEnE9lviuyfiNq
    QtG/OCyBDXL4Bh3FmUzfNI+Z4Bh3FmUx+z2n0FCv/4BpgHTDl8D95NPopWVo1RH2
    UfhyMd6dQ/x9T5m+y38JMzmSVAk+Fqu8ya18+yQVOEyEIx3Gxpsgegow33gcxfjK
    EsUgJHXcpw7OPxRrCQIDAQAB
    -----END PUBLIC KEY-----
"""
yaml = ruamel.yaml.YAML()  # defaults to round-trip
yaml.indent(mapping=4)
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)

Or the older more PyYAML like style (which has some restrictions in options that you can set)

import sys
import ruamel.yaml as yaml

yaml_str = """\
pubkey: |
    -----BEGIN PUBLIC KEY-----
    MIGfMA0GCSq7OPxRrQEBAQUAA4GNADCBiQKBgQCvRVUKp6pr4qBEnE9lviuyfiNq
    QtG/OCyBDXL4Bh3FmUzfNI+Z4Bh3FmUx+z2n0FCv/4BpgHTDl8D95NPopWVo1RH2
    UfhyMd6dQ/x9T5m+y38JMzmSVAk+Fqu8ya18+yQVOEyEIx3Gxpsgegow33gcxfjK
    EsUgJHXcpw7OPxRrCQIDAQAB
    -----END PUBLIC KEY-----
"""

data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
yaml.dump(data, sys.stdout, Dumper=yaml.RoundTripDumper, indent=4)

Both of which give you:

pubkey: |
    -----BEGIN PUBLIC KEY-----
    MIGfMA0GCSq7OPxRrQEBAQUAA4GNADCBiQKBgQCvRVUKp6pr4qBEnE9lviuyfiNq
    QtG/OCyBDXL4Bh3FmUzfNI+Z4Bh3FmUx+z2n0FCv/4BpgHTDl8D95NPopWVo1RH2
    UfhyMd6dQ/x9T5m+y38JMzmSVAk+Fqu8ya18+yQVOEyEIx3Gxpsgegow33gcxfjK
    EsUgJHXcpw7OPxRrCQIDAQAB
    -----END PUBLIC KEY-----

at least with Python 2.7 and 3.5+.

The indent=4 is necessary as the RoundTripDumper defaults to two spaces indent and the original indent of a file is not preserved (not doing so eases re-indenting a YAML file).

If you cannot switch to ruamel.yaml you should be able to use its source to extract all the changes needed, but if you can you can also use its other features like comment and merge key name preservation.

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • Well done. I'd been poring through the pyyaml source trying to figure out where the extra newline was being added so I could hopefully subclass it, but no luck before I went to bed. – MattDMo Jan 06 '16 at 13:12
  • @MattDMo The extra line in the "normal" dumper is because the string contains newlines. There are multiple ways to represent strings with special characters and PyYAML selections this one. The selection is in `emitter.choose_scaler_style()` based on the analysis of the scalar, but how it actual works is indeed difficult to trace. I circumvent all that by making a special type on reading in and setting its node style explictly when dumping. – Anthon Jan 06 '16 at 16:29
  • I just installed `ruamel.yaml` and have been playing around with it a bit: very nice. Two quick (off-topic) questions: where did the name come from, and do you have plans to support YAML 1.2? I ask because I have some files I'd like to work with that have as the first line `%YAML 1.2`. I have no idea what the differences between the specs are (even after reading the 1st part of the 1.2 spec), or even whether the files take advantage of any new features in 1.2 - they're mainly text keys with either string or boolean values (no objects). Here's one on github: http://bit.ly/1JXLVbf – MattDMo Jan 06 '16 at 18:01
  • Sorry for the URL shortener, I was running out of space :) – MattDMo Jan 06 '16 at 18:02
  • 2
    @MattDMo Yes I have plans in that direction. Several of my answers here on [so] already deal with partial implementations, but I want to make thing 1.1/1.2 differences run-time selectable (both for load and dump) and also restructure the code. I would have to go over the docs in detail as well as well as refactor. 1.2 changes are above all adaptations to make YAML a superset of JSON (1.1 is not e.g. wrt. floating point representations). It is a minor release, so no big changes. Ruamel is my company name and that characters are the (interwoven) initials of my children. – Anthon Jan 06 '16 at 19:45
  • I forgot I had announced 1.2 support here. It was added a long time ago already. ruamel.yaml defaults to 1.2 loading, but supports specifying 1.1 when loading or by the `%YAML 1.1` directive. – Anthon Jun 01 '17 at 07:25