47

I need to make a HTTP request and determine the response size in bytes. I have always used request for simple HTTP requests, but I am wondering if I can achieve this using raw?

>>> r = requests.get('https://github.com/', stream=True)
>>> r.raw

My only problem is I don't understand what raw returns or how I could count this data-type in bytes? Is using request and raw the right approach?

ewhitt
  • 897
  • 1
  • 12
  • 18
  • Note that the only way to get the size before downloading the entire file is to read the `content-length` header, if it exists. – cowlinator Feb 25 '20 at 04:00

2 Answers2

72

Just take the len() of the content of the response:

>>> response = requests.get('https://github.com/')
>>> len(response.content)
51671

If you want to keep the streaming, for instance if the content is (too) large you can iterate over chunks of the data and sum their sizes:

>>> with requests.get('https://github.com/', stream=True) as response:
...     size = sum(len(chunk) for chunk in response.iter_content(8196))
>>> size
51671
BlackJack
  • 4,476
  • 1
  • 20
  • 25
  • 3
    Does this just parse `Content-length` or does it actually measure the full content? Also, does response.content include HTTP headers? – ewhitt Jul 11 '14 at 17:05
  • 2
    That does determine the actual length of the content. At least the Github front page does not send a `Content-length` header. – BlackJack Jul 11 '14 at 18:12
  • Awesome! Much appreciated! – ewhitt Jul 11 '14 at 18:32
  • This does not consider the HTML header, as `response.content` is the HTML body in form of a string. – Marius Jan 02 '15 at 11:30
  • @BlackJack Is this value is in bytes? Cos I have heard that one character equals 1 byte only if the character set is ASCII latin1. if UTF-8 it can increase to 2 bytes depending on the character. – Marlon Abeykoon Sep 22 '16 at 10:45
  • 2
    @MarlonAbeykoon Yes the value is in bytes because `response.content` is bytes and not characters. If you want characters use the `response.text` attribute. Of course this only makes sense if the body actually _is_ text. If it's an image for instance, you'll get garbage or a decoding error when accessing the `text` attribute. – BlackJack Sep 22 '16 at 10:57
  • @BlackJack Actually type(r.content) is not bytes. Why do you say its bytes? – Marlon Abeykoon Sep 22 '16 at 11:40
  • @MarlonAbeykoon you're using python 2, right? Strings (`str`) in python 2 are byte strings. Exec `str is bytes` for fun and profit. – Ilja Everilä Sep 22 '16 at 12:14
  • ya python 2. len() means number of chars in string. so according to my earlier comment, 1 character = 1 byte if its ascii-latin1 only, if not we cant do this right? – Marlon Abeykoon Sep 22 '16 at 12:21
  • 1
    @MarlonAbeykoon In Python 2 `len()` on a `str` value means number of _bytes_ and `len()` on a `unicode` value means number of _characters_. (Actually number of code points because not everey code point is a character and there are characters consisting of more than one code point.) There is no thing like ”ascii-latin1”. It's either ASCII or Latin1. Latin1 has ASCII as subset though. – BlackJack Sep 22 '16 at 13:54
  • 3
    The OP is using a *streaming response*, accessing `r.content` is going to load all the data into memory first and that is usually **not** what you want when streaming the response. – Martijn Pieters Feb 15 '18 at 22:02
  • @MartijnPieters I've added an example with streaming. – BlackJack Feb 15 '18 at 22:38
  • 1
    @BlackJack: that's not a good example. Now you consumed all the data from the socket just to get the total size. **At the very least** try to get the `Content-Length` header first. Also see [Progress Bar while download file over http with Requests](//stackoverflow.com/q/37573483) – Martijn Pieters Feb 15 '18 at 22:42
  • The question was how to get the size from a source that _doesn't_ have a `Content-Length` header. If that example isn't good, how would you get the size without consuming the iterable or loading the whole content into memory? Even if I try `Content-Length` first I need to download/consume if there is no Content-Length header. – BlackJack Feb 15 '18 at 23:10
  • How about when compression (e.g. gzip) is applied? Is this answer still right? `r.content` should be decompressed. – Carson Ip May 07 '18 at 02:43
  • 1
    @CarsonIp: no, this answer is not correct. See [Content-length header not the same as when manually calculating it?](//stackoverflow.com/q/50825528) for an accurate method. – Martijn Pieters Jun 12 '18 at 21:59
  • 1
    It's a bit harsh to call this answer incorrect because compression/decompression is something that usually is transparently done on the fly. The compressed size is rarely interesting because the data isn't stored that way nor can it be used that way if we talk about the content. To get at the content you have to decompress. – BlackJack Jun 13 '18 at 07:30
7

r.raw is an instance of urllib3.response.HTTPResponse. We can count the length of response by looking up the response's header Content-length or use built-in function len().

defool
  • 301
  • 2
  • 4
  • 8
    Yes, but `Content-length` is not always provided. – ewhitt Jul 11 '14 at 17:04
  • 3
    @ewhitt: If there is no `Content-length` header then you can't know the full length until you have received all data. Accessing `r.content` forces the issue, that reads from the `raw` connection until all data has been read, building up the full document in memory. You may as well not use `stream=True` in that case. – Martijn Pieters Feb 15 '18 at 22:03
  • 5
    @MartijnPieters what about `gzip` responses which it automatically decompresses, so `len(r.content)` does not show true response size ... ? – madzohan Jul 17 '18 at 19:58
  • 2
    @madzohan what about those? If you need to know the HTTP body size of the response and there is no content-length header then see https://stackoverflow.com/questions/50825528/content-length-header-not-the-same-as-when-manually-calculating-it – Martijn Pieters Jul 17 '18 at 20:44
  • @MartijnPieters thank you) I've finished yesterday with simple urllib's response.read() :) – madzohan Jul 18 '18 at 04:49