4

I try to generate .xml files fith cyrillic symbols within. But result is unexpected. What is the simplest way to avoid this result? Example:

from lxml import etree

root = etree.Element('пример')

print(etree.tostring(root))

What I get is:

b'<&#1087;&#1088;&#1080;&#1084;&#1077;&#1088;/>'

Istead of:

b'<пример/>'
meathme
  • 71
  • 8
  • I can't try to make tag name in cyrillic - its only for example. Instead of - I try to places cyrillic symbols between tags. – meathme Apr 20 '15 at 14:27

1 Answers1

3

etree.tostring() without additional arguments outputs ASCII-only data as a bytes object. You could use etree.tounicode():

>>> from lxml import etree
>>> root = etree.Element('пример')
>>> print(etree.tostring(root))
b'<&#1087;&#1088;&#1080;&#1084;&#1077;&#1088;/>'
>>> print(etree.tounicode(root))
<пример/>

or specify a codec with the encoding argument; you'd still get bytes however, so the output would need to be decoded again:

>>> print(etree.tostring(root, encoding='utf8'))
b'<\xd0\xbf\xd1\x80\xd0\xb8\xd0\xbc\xd0\xb5\xd1\x80/>'
>>> print(etree.tostring(root, encoding='utf8').decode('utf8'))
<пример/>

Setting the encoding to unicode gives you the same output tounicode() produces, and is the preferred spelling:

>>> print(etree.tostring(root, encoding='unicode'))
<пример/>
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343