I have xml files that I get xml character error for some. Example:
lxml.etree.XMLSyntaxError: invalid character in attribute value, line 4, column 41976
I've read tons of things and tried them nothing helped. I would be interested to know solutions like this:
1- I read all xml files in folder, so for many of them it passes for some it stops. How can I ignore the stop in Python? 2- How can I fix the problem for the input files giving me error?
Sample code:
tree = etree.parse(sys.argv[1]+file)
for extraction in tree.findall("TIMEX3"):
value=""
for token in extraction.findall("TOKEN"):
value = value + " " + token.text
error:
lxml.etree.XMLSyntaxError: invalid character in attribute value, line 4, column 41976
tried this: https://gist.github.com/lawlesst/4110923 Didn't work. It actually created problem over the correct files as well.
I also checked character 41976 and it is totally a good character.
head -4 file.xml | tail -1 | head -c 41977
this is the result: last characters of the result:
numchild="0" numbsibling="0"
Thanks.