2

I am trying to parse xml document (URL) using requests,

facing following error:

ValueError: Unicode strings with encoding declaration are not supported

here is my code:

import requests
from lxml import etree
from lxml.etree import fromstring

req = requests.request('GET', "http://www.nbp.pl/kursy/xml/LastC.xml")

a = req.text
b = etree.fromstring(a)

how can i get this xml parsed. Thanks in advance for help

  • Have you looked at this post? http://stackoverflow.com/questions/15830421/xml-unicode-strings-with-encoding-declaration-are-not-supported – Alexey Gorozhanov Apr 04 '15 at 11:41
  • @AlexeyGorozhanov i tried it.. not working for me! throws same error –  Apr 04 '15 at 11:42
  • @quikrr: it cannot possibly throw the same error if you actually used `req.content`; are you 100% certain you used the right method there? – Martijn Pieters Apr 04 '15 at 12:00

1 Answers1

4

You are passing in the Unicode decoded version. Don't do that, XML parsers require that you pass in the raw bytes instead.

Instead of req.text, use req.content here:

a = req.content
b = etree.fromstring(a)

You can also stream the XML document to the parser:

req = requests.get("http://www.nbp.pl/kursy/xml/LastC.xml", stream=True)
req.raw.decode_content = True  # ensure transfer encoding is honoured
b = etree.parse(req.raw)
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343