0

I am trying to build a scraper using scrapy and I plan to use deltafetch to enable incremental refresh but I need to parse javascript based pages which is why I need to use splash as well. In the settings.py file, we need to add SPIDER_MIDDLEWARES = {'scrapylib.deltafetch.DeltaFetch': 100,} for enabling deltafetch whereas, we need to add SPIDER_MIDDLEWARES = {'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,} for splash

I wanted to know how would both of them work together if both of them use some kind of spider middleware.

Is there some way in which I could use both of them?

Aayush Agrawal
  • 184
  • 1
  • 6

1 Answers1

0

For other answers see here and here. Essentially you can use the request meta parameter to manually set the deltafetch_key for the requests you are making. In this way you can request the same page with Splash even after you've successfully scraped items from that page with Scrapy and vice versa. Hope that helps!

from scrapy_splash import SplashRequest    
from scrapy.utils.request import request_fingerprint

(your spider code here)

    yield scrapy.Request(url, meta={'deltafetch_key': request_fingerprint(response.request)})
CLPatterson
  • 113
  • 1
  • 14