7

I am working on an object where first python reads YAML, does some changes and then writes them back to file. Loading and updating values part is working fine but when I go to write the file it makes lists rather separate docs.

testing.yaml

apiVersion: v1
data:
  databag1: try this
  databag2: then try this
kind: ConfigMap
metadata:
  name: data bag info
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: data-bag-service
  name: data-bag-tagging

Code block

import yaml
with open("./testing.yaml", "r") as stream:
    deployment_dict= list(yaml.safe_load_all(stream))

print(deployment_dict)
with open("./testing.yaml", "w") as service_config:
    yaml.dump(
        deployment_dict,
        service_config,
        default_flow_style=False
    )

Transformation I am getting: testing.yaml

- apiVersion: v1
  data:
    databag1: try this
    databag2: then try this
  kind: ConfigMap
  metadata:
    name: data bag info
- apiVersion: extensions/v1beta1
  kind: Deployment
  metadata:
    labels:
      app: data-bag-service
    name: data-bag-tagging

How can I achieve the original state with the --- end-of-directive indicators?

Anthon
  • 69,918
  • 32
  • 186
  • 246
Ahsan Naseem
  • 1,046
  • 1
  • 19
  • 38
  • The `---` seperator is on top level an alternative to an array block with dashes. The export prefers the array block form. – Klaus D. Nov 05 '18 at 08:32
  • @KlausD. The `---` is a directives end indicator, which you can use even if you have no directives. I am not sure what you are referring to with array block, but if that are sequences, but some reference (URL) as to why this indicator could be an alternative would be useful. – Anthon Nov 05 '18 at 10:12

2 Answers2

9

According to the docs:

If you need to dump several YAML documents to a single stream, use the function yaml.dump_all. yaml.dump_all accepts a list or a generator producing

yaml.dump_all(
    deployment_dict,
    service_config,
    default_flow_style=False
)

You still need default_flow_style=False to get the block style output.

Example code:

import yaml


with open("./testing.yaml", "r") as stream:
    d = list(yaml.safe_load_all(stream))

d.append(d[-1])

with open("./testing2.yaml", "w") as stream:
    yaml.dump_all(
        d,
        stream,
        default_flow_style=False
    )

testing2.yaml

apiVersion: v1
data:
  databag1: try this
  databag2: then try this
kind: ConfigMap
metadata:
  name: data bag info
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: data-bag-service
  name: data-bag-tagging
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: data-bag-service
  name: data-bag-tagging
3

PyYAML is not really made for doing these kind of round-trip updates, it drops any comments you might have, and doesn't necessarily preserve the order of the keys of mappings.

I recommend you takea look at ruamel.yaml (disclaimer: I am the author of that package) for several reasons, including, but not limted to:

  • support of YAML 1.2 (but can write/read YAML 1.1 if necessary)
  • preservation of comments, key order, anchor/alias names, float/integer formats
  • finer control over indentation of mappings and lists
  • no need to load all the documents, process them and dump them in one go
  • optional preservation of quotes and/or block style scalars
  • safe loading by default, and a warning if you use the unsafe load in the backwards compatible API
  • many bug fixes


from pathlib import Path
from ruamel.yaml import YAML

path = Path('testing.yaml')
tmp_path = path.with_suffix('.yaml.tmp')


with YAML(output=tmp_path) as yaml:
    # yaml.indent(mapping=4, sequence=4, offset=2)
    # yaml.preserve_quotes = True
    for data in yaml.load_all(path):
        # update data
        yaml.dump(data)

path.unlink()
tmp_path.rename(path)

print(path.read_text(), end='')

which gives:

apiVersion: v1
data:
  databag1: try this
  databag2: then try this
kind: ConfigMap
metadata:
  name: data bag info
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: data-bag-service
  name: data-bag-tagging

Please note that you cannot write and read from the same file as you are processing a document at a time. Hence the temporary file which has the additional advantage, that if you get an error in updating that last document and your program crashes, you are not left with a half-written YAML stream.

Anthon
  • 69,918
  • 32
  • 186
  • 246