17

I'm trying to use the tostring method in XML to get a "pretty" version of my XML as a string. The example on the lxml site shows this example:

>>> import lxml.etree as etree
>>> root = etree.Element("root")
>>> print(root.tag)
root
>>> root.append( etree.Element("child1") )
>>> child2 = etree.SubElement(root, "child2")
>>> child3 = etree.SubElement(root, "child3")
>>> print(etree.tostring(root, pretty_print=True))
<root>
  <child1/>
  <child2/>
  <child3/>
</root>

However my output, running those exact lines is:

b'<root>\n  <child1/>\n  <child2/>\n  <child3/>\n</root>\n'

Is there a bug in the version of lxml I have installed? It seems odd the word for word example from the tutorial is not working.

lanteau
  • 255
  • 2
  • 7

1 Answers1

29

the b flag in front of the string shows you that it's a byte string. To print that as a unicode string (which is the typical encoding for a Python string), you can do:

print(etree.tostring(root,pretty_print=True).decode())

or etree.tostring has a flag that allows you to set the encoding, so:

print(etree.tostring(root,pretty_print=True,encoding='unicode'))

Either way works for me. Here's more information on Byte Strings and Strings

Adam Smith
  • 52,157
  • 12
  • 73
  • 112
  • 1
    Interesting, that works for me also. Is this just an error in their documentation? Did something in python change to make specifying the encoding necessary when it wasn't before? – lanteau Mar 28 '14 at 16:32
  • 1
    @lanteau I honestly don't know. I don't use `lxml` that much, but another section of the `lxml` docs show `etree.tostring` returning a `bytes` object when given the `method='text'` flag, so perhaps that was set as a default in a newer version of `lxml`? – Adam Smith Mar 28 '14 at 16:36
  • 1
    It is not fixed in the latest release available on pip, `lxml==3.5.0`. – Harald Nordgren Jan 27 '17 at 00:52