Say I have an expat parser instantiated like so:
def on_character_data(data):
print(data)
parser = xml.parsers.expat.ParserCreate(encoding=encoding)
...
parser.CharacterDataHandler = on_character_data
...
And an XML document like so:
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
</head>
<body>
ampersands & other annoyances
</body>
</html>
If I call parser.Parse(test_xml_string)
The handler on_character_data()
will receive the string ampersands & other annoyances
as ampersands & other annoyances
with the &
replaced with &
. I want expat to ignore these entities, so that on_character_data()
will receive the unmodified ampersands & other annoyances
. Is there any way I can do this?