xmltodict fails on the first line of an xml file

Question

This line in a python script:

result = xmltodict.parse('path/to/schema.xml', encoding='utf-8')

generates this error:

johnnyb@verahost ~/SignalDB $ python3 xmltest.py
Traceback (most recent call last):
  File "xmltest.py", line 13, in <module>
    result = xmltodict.parse('path/to/schema.xml', encoding='utf-8')
  File "/home/johnnyb/.local/lib/python3.5/site-packages/xmltodict.py", line 330, in parse
    parser.Parse(xml_input, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 8

The first line of the file is:

ï»¿<?xml version="1.0" encoding="utf-8" standalone="yes"?>

What am I missing? Note: the BOM at the beginning does not show up via head at the linux command line (above text is from Win10). Suggestions welcome! Never had to monkey with XML before, my luck ended today...

EDIT: I was able to get around it by open()ing the file first, but this seems like it should be unnecessary?

with open('path/to/schema.xml', 'r', encoding='utf-8') as fd:
    result = xmltodict.parse(fd.read())

Yeah opening the file should be unnecessary - the XML is either well-formed or it isn't. What happens when you delete the BOM at the top of the file? — alex, Nov 10 '17 at 21:05
Also, have you tried using some service to validate that the XML is well-formed prior to trying to parse it out? — alex, Nov 10 '17 at 21:15
@alex I did not in fact try removing the BOM. The xml lints fine in atom so I didn't bother trying elsewhere. I will try both on Monday morning. — Omortis, Nov 11 '17 at 20:02
@Omortis Hey, just wanted to know if you got it working in the end? — DarkFranX, May 31 '19 at 13:21
@DarkFranX many many apologies, I missed your reply - the XML I was receiving was indeed poorly formatted (contained no encoding line). `xmltodict` is picky about that. Sorry, moved on a year ago... — Omortis, Jul 23 '19 at 21:22

xmltodict fails on the first line of an xml file

0 Answers0