0

I have xml files that I get xml character error for some. Example:

lxml.etree.XMLSyntaxError: invalid character in attribute value, line 4, column 41976

I've read tons of things and tried them nothing helped. I would be interested to know solutions like this:

1- I read all xml files in folder, so for many of them it passes for some it stops. How can I ignore the stop in Python? 2- How can I fix the problem for the input files giving me error?

Sample code:

tree = etree.parse(sys.argv[1]+file)
for extraction in tree.findall("TIMEX3"):
    value=""
    for token in extraction.findall("TOKEN"):
         value = value + " " + token.text

error:

lxml.etree.XMLSyntaxError: invalid character in attribute value, line 4, column 41976 

tried this: https://gist.github.com/lawlesst/4110923 Didn't work. It actually created problem over the correct files as well.

I also checked character 41976 and it is totally a good character.

head -4 file.xml | tail -1 | head -c 41977 

this is the result: last characters of the result:

numchild="0" numbsibling="0" 

Thanks.

user3430235
  • 419
  • 1
  • 4
  • 12
  • 1
    For 1, it's hard to say without knowing what the invalid character is (or I guess, what lxml *thinks* the invalid character is). For 2, you could just wrap the processing inside a `try` block, and add an `except` handler for `lxml.etree.XMLSyntaxError` where you print a note, and continue on your way. Make sure this try block is *inside* the loop you're using to iterate over your filenames. – jedwards Mar 15 '15 at 02:40
  • I tried: `try: tree = etree.parse(sys.argv[1]+file) except etree.XMLSyntaxError as e: print e`. so it's now good! – user3430235 Mar 15 '15 at 03:35
  • and you did `from lxml import etree`? Does what you tried not work? – jedwards Mar 15 '15 at 03:39
  • now i can skip the files with error! but can't do anything with them. – user3430235 Mar 15 '15 at 03:41
  • Right, you'll need to address the root cause of the exception in the first place. The try/except block was just to allow you to continue processing the other files despite running into an exception. – jedwards Mar 15 '15 at 03:44

0 Answers0