Can we only get the web page header information and not the body? (Mechanize)

Question

What if I only need to download the page if it has not changed since the last download? What is the best way? can I get the size of the page first, then compare the decide if it has changed, if so, I ask for download else skip?

I plan to use (python) mechanize.

score 5 · Accepted Answer · edited Jun 20 '20 at 09:12

the request should be a HEAD, not a GET:

9.4 HEAD

The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.

The response to a HEAD request MAY be cacheable in the sense that the information contained in the response MAY be used to update a previously cached entity from that resource. If the new field values indicate that the cached entity differs from the current entity (as would be indicated by a change in Content-Length, Content-MD5, ETag or Last-Modified), then the cache MUST treat the cache entry as stale.

See here How can I perform a HEAD request with the mechanize library?

Yuda Prawira · Answer 2 · 2011-05-07T12:48:17.600

0

yes you can get more information in python mechanize by setting like this

br = mechanize.Browser()
br.set_debug_http(True)
br.set_debug_redirects(True)
... Your code here ...

by doing this, you can get valuable header information of the page

edited May 07 '11 at 12:48

answered May 07 '11 at 12:02

Yuda Prawira

12,075
10
46
54

Can we only get the web page header information and not the body? (Mechanize)

2 Answers2