2

I want to load an XML template from one file, modify it, and save the results a new file with formatting. However pretty printing is not adding the desired formatting. Other solutions on stack are for when the tree is written back to the same file, but not a new one. For example:

from lxml import etree as ET 

parser = ET.XMLParser(remove_blank_text=True) 
tree = ET.parse("template.xml", parser) 
root = tree.getroot() 
A = ET.SubElement(root, "A") 
ET.SubElement(A, "a") 
B = ET.SubElement(root, "B") 
ET.SubElement(B, "b") 
tree.write("output.xml", pretty_print=True)

template.xml

<document>
</document>

output.xml is written without formatting

<document>
<A><a/></A><B><b/></B></document>
Paul
  • 360
  • 3
  • 10
  • Have you had a look at the thread over here: [lxml_why_u_no_format](https://stackoverflow.com/questions/5086922/python-pretty-xml-printer-with-lxml/9612463) – cullzie Jan 31 '19 at 06:29

1 Answers1

4

Edit the text inside template.xml to be like this:

<document></document>

And run your code again, you will get this:

<document>
  <A>
    <a/>
  </A>
  <B>
    <b/>
  </B>
</document>

But the important question is WHY?!

The answer can be found in the formal documentation which states that:

Pretty printing (or formatting) an XML document means adding white space to the content. These modifications are harmless if they only impact elements in the document that do not carry (text) data. They corrupt your data if they impact elements that contain data. If lxml cannot distinguish between whitespace and data, it will not alter your data. Whitespace is therefore only added between nodes that do not contain data. This is always the case for trees constructed element-by-element.

Anwarvic
  • 12,156
  • 4
  • 49
  • 69