How to use one crawler for multiple domains?

Asked Jun 24 '20 at 13:15

Active Jun 24 '20 at 13:15

Viewed 165 times

I'm working on a project that involves crawling multiple domains in their entirety. My scraper simply crawls the whole domain and doesn't check specific parts of the html, it just gets all the html.

For some domains, I would only want to crawl one subdomain, but other than that everything about the crawl itself would be the same for each domain.

The rest of the things that are different between the domains would be handled once the crawl is finished, in a separate python script.

My question is, do I need to write a unique Scrapy crawler for each domain, or can I use one, and pass to it parameters for allowed_domains/start_urls.

I'm using Scraping Hub, and I may need to run the crawl on all my domains at the same time.

asked Jun 24 '20 at 13:15

Jake 1986

A single spider should work. – Gallaecio Jun 29 '20 at 10:53
@Gallaecio thanks. Would it work in Scraping Hub if I need to run the crawl every day for all the domains at the same time? – Jake 1986 Jun 29 '20 at 13:24
As long as the crawl takes less than 24h, I believe so, yes. – Gallaecio Jun 29 '20 at 14:22

How to use one crawler for multiple domains?

0 Answers0