40

Trying to get the raw data of the HTTP response content in requests in Python. I am interested in forwarding the response through another channel, which means that ideally the content should be as pristine as possible.

What would be a good way to do this?

TAbdiukov
  • 1,185
  • 3
  • 12
  • 25
Juan Carlos Coto
  • 11,900
  • 22
  • 62
  • 102

4 Answers4

37

After requests.get(), you can use r.content to extract the raw Byte-type content.

r = requests.get('https://yourweb.com', stream=True)
r.content
William
  • 4,258
  • 2
  • 23
  • 20
35

If you are using a requests.get call to obtain your HTTP response, you can use the raw attribute of the response. Here is the code from the requests docs. The stream=True parameter in the requests.get call is required for this to work.

>>> r = requests.get('https://github.com/timeline.json', stream=True)
>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>
>>> r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'
Blair
  • 6,623
  • 1
  • 36
  • 42
4

To add to @blair answer, as stated in the docs:

In general, however, you should use a pattern like this to save what is being streamed to a file:

r = requests.get('https://yourweb.com', stream=True)

with open(filename, 'wb') as fd:
   for chunk in r.iter_content(chunk_size=128):
      fd.write(chunk)

Using Response.iter_content will handle a lot of what you would otherwise have to handle when using Response.raw directly. When streaming a download, the above is the preferred and recommended way to retrieve the content. Note that chunk_size can be freely adjusted to a number that may better fit your use cases.

That pattern not only has the advantages described above, but is also a good to fetch data in environments with limited memory.

Blair
  • 6,623
  • 1
  • 36
  • 42
rodrigo-silveira
  • 12,607
  • 11
  • 69
  • 123
1

The following is an easy way to recreate the whole HTTP response, including the HTTP header's initial Status Line:

r = requests.get('https://yourweb.com/', stream=True)
print(f"HTTP/{r.raw.version/10} {r.raw.status} {r.raw.reason}")
for k,v in r.raw.headers.items(): print(f"{k}: {v}")
print(r.text)

This may not be 100% pristine, but it should be very close. And you could use print()'s file= parameter to redirect the output to a file.