0

Why this code is working without issues on my mac with any version of python, requests and lxml, but doesn't work in any docker container? i tried everything(

it just fails on 34533 line (discovered by printing el.sourceline)

from requests import get
from lxml import etree

r = get('https://printbar.ru/synsfiles/yandex/market/idrr_full.xml')
with open('test.xml', 'wb') as f:
    f.write(r.content)

tree = etree.iterparse(source='test.xml', events=('end',))
for (ev, el) in tree:
    continue

print('ok')

https://printbar.ru/synsfiles/yandex/market/idrr_full.xml seems completely valid and works locally on any of my macs...

i tried ubuntu, alpine, several python containers even with prebuilt lxml, nothing helped. I expected that parsing this file won't throw this error in the middle of parsing:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "src/lxml/iterparse.pxi", line 210, in lxml.etree.iterparse.__next__
  File "src/lxml/iterparse.pxi", line 195, in lxml.etree.iterparse.__next__
  File "src/lxml/iterparse.pxi", line 230, in lxml.etree.iterparse._read_more_events
  File "src/lxml/parser.pxi", line 1376, in lxml.etree._FeedParser.feed
  File "src/lxml/parser.pxi", line 606, in lxml.etree._ParserContext._handleParseResult
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "test.xml", line 1
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1

xmllint says that there is encoding error, but it works locally on mac...) HOW?) i want it dockerized!)

Van4ozA
  • 1
  • 2
  • Check what is r.status_code. – balderman Jan 12 '23 at 18:45
  • It has nothing to do with `r.status_code` -- the HTTP request completes successfully, and the error reproduces if you download the file externally and attempt to parse it. – larsks Jan 12 '23 at 18:47
  • Is the encoding in the file correct? It says, `windows-1251`, but if I replace that with `utf8` your script parses the file without errors. – larsks Jan 12 '23 at 18:48

0 Answers0