2

I'm using Scrapy-Splash requests to get a rendered screenshot of a page, but I also need the images on that page. I use the pipelines to download those images, but I was thinking - does this not make two requests for the same image? Once when Splash is rendering the page and once when I send a download request. Is there a way I can get the images returned by the Scrapy-Splash request?

Akustik
  • 43
  • 8

1 Answers1

3

You can enable response bodies (use either respone_body argument or splash.response_body_enabled=True) and then extract images from HAR export.

Mikhail Korobov
  • 21,908
  • 8
  • 73
  • 65
  • 1
    Thank you, that works. To be more precise here's some code if anyone looks this up... You need to add `'response_body': 1, 'har': 1` to `splash_args` and that should give you the HAR data in your json or har endpoint. – Akustik Jul 24 '17 at 08:56
  • Would you know how to get a response_body returned every time? At the moment I only get it the first time I visit a site. I assume this happens because of the splash cache? – Akustik Jul 24 '17 at 14:51
  • Yes, it happens because of cache. Currently there is no way to disable this cache; it is possible to clear it using [_gc](http://splash.readthedocs.io/en/stable/api.html#gc) endpoint, but it is just a workaround. – Mikhail Korobov Jul 24 '17 at 18:03