I learn python (2.7 version) and i have task to check the xml document by xsd schema using lxml library (http://lxml.de/). I have two files - examples like these:
$ cat 1.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE yml_catalog SYSTEM "shops.dtd">
<a>
<b>Привет мир!</b>
</a>
and
$cat 2.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="a" type="AType"/>
<xs:complexType name="AType">
<xs:sequence>
<xs:element name="b" type="xs:decimal" />
</xs:sequence>
</xs:complexType>
</xs:schema>
It should be very simple, but i don't understand how to use lxml with utf-8 (never working with codings hard). I do simple steps:
>>> from lxml import etree
>>> schema = etree.parse("/tmp/qwerty/2.xsd")
>>> xmlschema = etree.XMLSchema(schema)
>>> try:
document = etree.parse("/tmp/qwerty/1.xml")
print "Parse complete!"
except etree.XMLSyntaxError, e:
print e
Parse complete!
>>> xmlschema.validate(document)
False
>>> xmlschema.error_log
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
xmlschema.error_log
File "xmlerror.pxi", line 286, in lxml.etree._ListErrorLog.__repr__ (src/lxml/lxml.etree.c:33216)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 85-90: ordinal not in range(128)
And i cannot get all raised exceptions from .error_log.
Have any workaround with encode/decode methods to check it at all (with success) or maybe solution (and without another library (i talk about standard python methods)), or maybe i need to use StringIO (but how)?
I understand that my problem deprnds on "Привет мир!" and xs:decimal - these are only examples (short). Sorry for my English. Thank you.