16

I have an XML document which I'm pretty-printing using lxml.etree.tostring

print etree.tostring(doc, pretty_print=True)

The default level of indentation is 2 spaces, and I'd like to change this to 4 spaces. There isn't any argument for this in the tostring function; is there a way to do this easily with lxml?

Eli Courtwright
  • 186,300
  • 67
  • 213
  • 256

4 Answers4

13

Since version 4.5, you can set indent size using indent() function.

etree.indent(root, space="    ")
print(etree.tostring(root))
kuch
  • 192
  • 1
  • 5
5

As said in this thread, there is no real way to change the indent of the lxml.etree.tostring pretty-print.

But, you can:

  • add a XSLT transform to change the indent
  • add whitespace to the tree, with something like in the cElementTree library

code:

def indent(elem, level=0):
    i = "\n" + level*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for elem in elem:
            indent(elem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i
ThibThib
  • 8,010
  • 3
  • 30
  • 37
  • The link to lxml-dev list archives from Feb 2009 is broken, it's at: https://mailman-mail5.webfaction.com/pipermail/lxml/20090208/012298.html . But anyway, kludging whitespace into actual tree elements seems nasty, haven't people asked for this as an enhance? – smci Dec 29 '18 at 09:10
  • 2
    @smci: There is a built-in `indent()` function since lxml 4.5 (released 2020-01-29). See the answer from @kuch. – mzjn Feb 21 '23 at 16:30
1

This can be easily done, using XMLParser and indent. There is no need for pretty_print :

parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse('myfile.xml',parser) 
etree.indent(tree, space="    ")
tree.write('myfile.xml', encoding='UTF-8')
0

You may check this solution. Changing the space value allows you to get any indent you want. It can be different amount of spaces or tab "\t" character(s).