4

I'm trying to scrape part of a part-website that contain images of the parts, to collect some statistics. However, there is no url or image upload or creation date, so I have to use the approximate image file modification-date to get this info. Using cURL, this is an easy task with:

curl -sI https://path.to.com/blahblah_123/item_picture.jpg |grep "last-modified"

However, I think it would be more convenient to get this within the scrapy spider. But I have no idea if scrapy supports this at all, since I cannot find it in the documentation.

Is there a way to get the last-modified date of a scraped image in scrapy?

not2qubit
  • 14,531
  • 8
  • 95
  • 135
  • 1
    You can use `python-requests` instead of Scrapy and get required header value as `requests.get('https://path.to.com/blahblah_123/item_picture.jpg').headers['last-modified']` – Andersson Sep 17 '18 at 10:26
  • This is also a very good alternative. – not2qubit Sep 17 '18 at 11:16

1 Answers1

3

From the documentation of Scrapy, the response has a headers dict field.

So you can access the last-modified with response.headers.get('Last-Modified').

tdelev
  • 843
  • 7
  • 8