I'm using scrapy to get data from a website.The website But there is a problem that I don't know how to get the increment data after the website has been updated in server or how to know the website has been updated?
The table in webpage is what I want to crawl, like this:
Just as you can see, there is a column named "Add Date". So when the data has been updated, I just want to get the data that has been added lately. And there is a problem that after updated the url of website won't have any changes. It's still
https://gold.jgi.doe.gov/projects
.
I've read this Q&A Strategy for how to crawl/index frequently updated webpages?. I understand a little bit of the theory. But I still don't know how to implement this when using scrapy, can anybody give an example or some detailed information?