0

So I am bit surprised, while I was trying to understand requests module.

>>> furl = 'http://www.downvids.net/downloads/07275feaf477cc0f5a7a67cba965594d5c83/'

>>> resp = requests.get(furl, headers={'Accept-Encoding': 'identity'})
>>> resp.headers['content-length']
'7254371'
>>> resp2 = requests.head(furl)
>>> resp2.headers['content-length']
'20'

but doing by requests.get i think it is downloading the whole file to buffer from which it is getting the content length !!!

so what should be the correct approach to get correct content-length if it is the case of url redirects which seems to be because i tried resp2.status_code that gave me 302

Ciasto piekarz
  • 7,853
  • 18
  • 101
  • 197
  • I think [this](http://stackoverflow.com/questions/23345225/http-head-method-content-length-does-not-match-with-size-on-view-page-info?rq=1) is also seeking a similar answer, however I have used `requests` module ... he is asking in general ! – Ciasto piekarz Jul 27 '14 at 12:07

1 Answers1

0

When doing a HEAD request, requests sets allow_redirects to False, by default; this differs from all the other HTTP methods where redirection following is on by default. See the Redirection and History documentation:

By default Requests will perform location redirection for all verbs except HEAD.

You can force it to follow redirects by setting allow_redirects=True:

resp2 = requests.head(furl, allow_redirects=True)

Your GET did follow the redirect (from the original URL to https://scontent-b-ams.xx.fbcdn.net/hvideo-xpa1/v/t42.1790-2/1598692_10153946120225652_1024334852_n.mp4?oh=de27dad30979955f4e8fef28b85f9af9&oe=53D50345); your HEAD request did not.

Servers SHOULD return the same headers for a HEAD as they do for a GET, but the RFC verb SHOULD here also means that a server may ignore that requirement if the implementation would be too costly or for any other reason.

You can always make a GET request that doesn't download the body, by setting stream=True:

 resp = requests.get(furl, stream=True)
Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • so no need to use `requests.get` than, however could there be any situation when `requests.head` won't get the `content-length` even when `allow_redirects` is set to `True` ? – Ciasto piekarz Jul 27 '14 at 13:00
  • The server *should* give you the same headers as a GET would return; in practice, some servers could violate that rule still. There is nothing you can change about that other than use a `request.get()` with `stream=True` set, then just not reading the response body. – Martijn Pieters Jul 27 '14 at 17:34