I have to parse XML files that start as such:
xml_string = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<annotationStandOffs xmlns="http://www.tei-c.org/ns/1.0">
<standOff>
...
</standOff>
</annotationStandOffs>
'''
The following code will only fly if I eliminate the first line of the above shown string:
import xml.etree.ElementTree as ET
from lxml import etree
parser = etree.XMLParser(resolve_entities=False,strip_cdata=False,recover=True)
XML_tree = etree.XML(xml_string,parser=parser)
Otherwise I get the error:
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.