1

I need to do a HTTP POST to a specific url for conducting a search (the search is only done via POST in this case) with some header fields set and some data in the body of the POST request. The server responds with a continuous stream of data (XML) until the search results end. When i print out the response for a large query i run into a "Memory Error"

I was looking at Python's Requests API. How can i stream the results of the POST response without overflowing memory?

I basically need to parse the XML and then write them to a file. How can i best achieve this?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
user720694
  • 2,035
  • 6
  • 35
  • 57
  • 1
    See my answer on the linked duplicate question: the section that start with *or, if the response is particularly large, use an incremental approach* details how to incrementally parse XML from a URL with `requests`. – Martijn Pieters Jul 05 '15 at 01:00
  • @MartijnPieters So the requests.get() also works for POST requests with headers and body? – user720694 Jul 05 '15 at 01:59
  • 1
    Yes, it doesn't matter what verb was used (`GET`, `POST`, etc, the only difference is the size of the request body) or if additional headers were set. – Martijn Pieters Jul 05 '15 at 02:15
  • @MartijnPieters Thanks. One last question: I need to maintain high throughput and i might want to dump the incoming stream into a file and run a thread separately to process the dumped xml in the file. However i might run into a producer-consumer problem if i do so. Which approach do you recommend - parsing immediately as i receive the stream or parsing separately? – user720694 Jul 05 '15 at 02:27
  • Sorry, no idea there; try out both approaches and see what works best for your use cases. – Martijn Pieters Jul 05 '15 at 02:28
  • @MartijnPieters Can ElementTree handle incomplete xml? I need to get values of specific tags from the large file. It might be the case that the xml is malformed i.e. there might not be an end tag present while the file is being written. – user720694 Jul 05 '15 at 07:15
  • 1
    `iterparse()` parses the XML as it streams in, which by definition is incomplete. Malformed XML cannot be parsed. – Martijn Pieters Jul 05 '15 at 12:34

0 Answers0