0

I know I can check the last modification time with

wget -S http://www.staticpage.com

as long as the page is static. But when doing the same to a dynamic page I always get the present time.

So, what is the less intrusive way to ask a site if a page has changed since some arbitrary time or when the page was updated the last time. I obviously could download the whole page and compare with the content I have saved on file, but I want to reduce overhead.

Quora Feans
  • 928
  • 3
  • 10
  • 21
  • You can only trust the modification date server reports. With a dynamic page there is no way to detect when the page was modified if the server always reports current time! – RaviH Jan 29 '14 at 13:55

1 Answers1

1

A dynamic page is literally updating each page load. If you want to know when a dynamic page is updated you're going to need to look at the page itself or an RSS feed for the page. Your best bet is generally going to be to download it and parse out the latest date from the latest post.

UPDATE: If you want to limit the amount of data you read when downloading a page you can use the following:

curl http://someurl.com | head -c 512

Linux will stop reading from the stream and end the request after 512 bytes using this. It is up to the server to see that and stop transmitting. This may or may not happen but at least you aren't wasting more bandwidth.

krowe
  • 2,129
  • 17
  • 19
  • But, do I have to download the whole page? Can't I just download a part, some words here and there and see if they match with the old file locally saved? That would be like a fingerprint of the page. – Quora Feans Jan 29 '14 at 14:05
  • See my update for how to prevent loading the entire page. Unfortunately, the majority of pages on the web which are dynamic are going to have a mostly static heading so most of what you want will be in the body. – krowe Jan 29 '14 at 14:14