I'm attempting to write YAML files from Python that contain code snippets in various languages, and I want these snippets to be human-legible in the resulting YAML file, using |
-style literals with indented multiline strings.
Various answers such as this one suggest using yaml.representer()
to set style='|'
only for strings that actually contain newlines, but for simplicity we'll suppose we want to print all strings in the |
style. This should be achievable by calling yaml.dump(..., default_style="|")
.
For some strings, this works fine.
However, for the string "foobar\n "
(an 8-char string ending in a newline and a space), it refuses to write the literal in multiline form, and instead quotes the string with escaped newlines.
At first I suspected that |
in the YAML spec somehow doesn't support lines that contain only spaces, but this is not the case. The following code shows my desired, hand-written YAML translation is valid and will yaml.load()
correctly:
data = "foobar\n " # Data ends in a newline and a space
desired_yaml = """\
|
foobar
""" # Note final YAML line has 3 spaces - 2 for indent, 1 for data
# Demonstrate that `desired_yaml` is equivalent to `data`:
import yaml
assert yaml.safe_load(desired_yaml) == data
If I try to generate this YAML programmatically, though, PyYAML insists on quoting/escaping the string:
dumped = yaml.safe_dump(data, default_style="|")
assert dumped != desired_yaml
print(dumped)
The printed output shows that dumped
is now a quoted string containing escaped newlines:
"foobar\n "
I'd like the YAML output to instead look like this -- a three-line string ending in three spaces:
| foobar
How can I achieve this with PyYAML? If this is not possible, are there third-party alternatives that would work?