I'm trying to get a Gzipped XML file from an FTP server, parse the XML, and pull out data using Xpaths all without having to store the files on disk. The code I've got is:
FTP.connect(hostname)
FTP.login(user,pass)
flo = io.BytesIO()
FTP.retrbinary('RETR myfile.xml.gz',flo.write)
flo.seek(0,0)
uncompressed = gzip.decompress(flo.read())
tree = etree.parse(uncompressed,etree.XMLParser(encoding='utf-8', ns_clean=True, recover=True))
Up until the etree.parse() call everything works well, after which I get the contents of the XML file printed to screen prepended with:
OSError: Error reading file 'b'<?xml version="1.0" ...
and ending with
failed to load external entity "b'<?xml version="1.0" encoding="UTF-8"?><merchandiser xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNam
If I write the uncompressed file to disk first and then load it back in, the parse command works. I've tried parsing with using a parser that has resolve_entities=False, but nothing changes in the output.
I've seen posts such as Error 'failed to load external entity' when using Python lxml - however they refer to trying to parse a string with etree.parse() whereas I'm dealing with a byte object
type(uncompressed)
<class 'bytes'>
Any help is much appreciated. Thanks