0

I have this xml file located at path "C:\Program Files (x86)\Microsoft SQL Server\100\Setup Bootstrap\Log\20210331_124249\Datastore_GlobalRules\Datastore_Discovery.xml", it is a one-line-only xml file, you can view it here.

It is pretty ugly and not very readable and hard to get information from it, so I Google searched for a method to beautify the xml file with Python, and I found this question:Pretty printing XML in Python

The first two answers didn't give me what I wanted, the printed xml is still ugly, but the third answer did give me what wanted:

from xml.etree import ElementTree

def indent(elem, level=0):
    i = "\n" + level*"  "
    j = "\n" + (level-1)*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for subelem in elem:
            indent(subelem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = j
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = j
    return elem        

xml = ElementTree.parse('C:/Program Files (x86)/Microsoft SQL Server/100/Setup Bootstrap/Log/20210331_124249/Datastore_GlobalRules/Datastore_Discovery.xml').getroot()
indent(xml)
ElementTree.dump(xml)

This is the output: output.xml

However I can't redirect the output to an xml file;

I first tried to use this method:

out = open('C:/Output.xml','w')
out.write(ElementTree.dump(xml))
out.close()

It gave this error:

TypeError: write() argument must be str, not None

Tried this:

xml.write('C:/output.xml')

It gave this error:

AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'write'

If I use this:

ElementTree.dump(xml).write('C:/output.xml')

Results this error:

AttributeError: 'NoneType' object has no attribute 'write'

How can I redirect the output of ElementTree.dump(xml) to an xml file? I am sorry if this question is too trivial but I am very new to Python, I don't know much, how can I do this? Any help is truly appreciated.


P.S. About how I got the output file, I copy-pasted the output from the console window.

  • 2
    you should check out the documentation for `dump` https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.dump. It says `Writes an element tree or element structure to sys.stdout. This function should be used for debugging only.` – Chris Doyle Mar 31 '21 at 08:49
  • In Python 3.9, there is a built-in `indent` method for pretty-printing with ElementTree: https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.indent – mzjn Mar 31 '21 at 08:56
  • @mzjn Nice one, i didnt know that, i just posted an answer using lxml cause of its in built pretty print – Chris Doyle Mar 31 '21 at 08:56

1 Answers1

1

The dump method dumps the output to sys.stdout you could use the lxml module which has a built in pretty print feature.

from lxml import etree

data = r"""<a><b>hello</b><c>world</c><d><e>foo</e><f>bar</f></d></a>"""
tree = etree.fromstring(data)
print(etree.tostring(tree, pretty_print=True).decode())

OUTPUT

<a>
  <b>hello</b>
  <c>world</c>
  <d>
    <e>foo</e>
    <f>bar</f>
  </d>
</a>

However if you did want to use only ElementTree since its built in and use your own func, then you need to call the tostring method not dump.

from xml.etree import ElementTree

def indent(elem, level=0):
    i = "\n" + level*"  "
    j = "\n" + (level-1)*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for subelem in elem:
            indent(subelem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = j
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = j
    return elem

data = r"""<a><b>hello</b><c>world</c><d><e>foo</e><f>bar</f></d></a>"""
tree = ElementTree.fromstring(data)
indent(tree)
print(ElementTree.tostring(tree).decode())

but as you see its not actually as pretty as it should be, not everything is nested correctly

OUTPUT

<a>
  <b>hello</b>
<c>world</c>
<d>
    <e>foo</e>
  <f>bar</f>
  </d>
</a>
Chris Doyle
  • 10,703
  • 2
  • 23
  • 42
  • Well thank you for your answer, your answer is good, but here is just a little reminder, you missed the part to redirect the output: `out=open('C:/Path/to/file','w')`, `print(ElementTree.tostring(tree).decode(), file=out)`, `out.close()`. I got the steps by myself though I can't have redirected the output using the wrong method; Anyway thanks and please include the actual code to redirect the output. –  Mar 31 '21 at 09:12
  • 1
    Well your fundamental issue was converting the xml into a nice string to write to a file. you already knew how to write a string to a file so I didnt see it relevant to include that in the answer. Its the same reason i didnt include loading the xml from a file, instead just from a string, as the problem you had was converting the xml to a pretty string not loading a file or writing to a file. – Chris Doyle Mar 31 '21 at 09:28