2

Somebody knows why this code is ok:

text='''<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
</data>'''
root=ET.fromstring(text)
ET.tostring(root, method='xml')
ET.tostring(root, encoding='UTF-8', method='xml')

But when I use Unicode encoding: ET.tostring(root, encoding='Unicode', method='xml')

I get:

 Traceback (most recent call last):
  ... omissis ...
  File "/home/ago/anaconda3/lib/python3.6/xml/etree/ElementTree.py", line 915, in _serialize_xml
    write("<" + tag)
TypeError: a bytes-like object is required, not 'str'

TypeError: a bytes-like object is required, not 'str'

According python 3.6 doc with tostring I can use 'Unicode' ...

"Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring is generated)."

I can use ElementTree.write(... encoding='Unicode' ...) with no issue.

I use:

Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 12:22:00) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux

with

Python 3.4.2 (default, Oct  8 2014, 10:45:20) 
[GCC 4.9.1] on linux

I have different behaviour:

ET.tostring(root, encoding='Unicode')
Traceback (most recent call last):
  ... omissis ...
  File "/usr/lib/python3.4/xml/etree/ElementTree.py", line 917, in _serialize_xml
    write("<" + tag)
TypeError: 'str' does not support the buffer interface

thanks in advance

Kewl
  • 3,327
  • 5
  • 26
  • 45
agossino
  • 21
  • 3
  • The errors go away if you use `encoding='unicode'` (lower-case 'u'), don't they? – mzjn Apr 27 '17 at 11:49
  • Bingo!! Thanks mzjn, I misread what I quoted in my question: **"Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring is generated)."** ... ElementTree.write() use `enc_lower = encoding.lower()` to lower the string. Even if tostring seems to call ElementTree.write, it works differently. I will dig into the ET code ... – agossino Apr 27 '17 at 21:05

1 Answers1

0

'Unicode' isn't a valid encoding as its not an encoding but a series of codepoints (see utf-8 vs unicode).

Python 3 stores strings in Unicode, so to make it a bytes-like object it needs to be encoded first (for example to utf-8).

ElementTree.fromstring expects a bytes-like object encoded with a specific encoding and not Unicode.

As a side-note, the reverse would be to take a bytes-like object and decode to Unicode using the encoding of the bytes-like object.

Community
  • 1
  • 1