2

After symbol '\n' pretty_print is ignored. For example:

import lxml.etree as etree

strs = ["<root>\n<e1/><e2/></root>",
  "<root><e1/><e2/></root>"]

for str in strs:
 xml = etree.fromstring(str)
 print etree.tostring(xml, pretty_print=True)

Output is:

<root>
<e1/><e2/></root>

<root>
  <e1/>
  <e2/>
</root>

Both strings are valid xml. The first string has symbol '\n' and pretty_print is ignored after this symbol.

Is it and lxml bug or do I need special operations for pretty formatting?

alevko
  • 71
  • 1
  • 5

1 Answers1

5

Thank you, Corley

The reason for this behavior is explained here: http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output

and correct code is:

import lxml.etree as etree

strs = ["<root>\n<e1/><e2/></root>",
    "<root><e1/><e2/></root>"]

parser = etree.XMLParser(remove_blank_text=True)
for str in strs:
    xml = etree.fromstring(str, parser=parser)
    print etree.tostring(xml, pretty_print=True)

    # or for Python 3.x
    print(etree.tostring(xml, pretty_print=True).decode())
    # here I assume utf-8 encoding
x-yuri
  • 16,722
  • 15
  • 114
  • 161
alevko
  • 71
  • 1
  • 5