0

I'm working on a project that involves crawling multiple domains in their entirety. My scraper simply crawls the whole domain and doesn't check specific parts of the html, it just gets all the html.

For some domains, I would only want to crawl one subdomain, but other than that everything about the crawl itself would be the same for each domain.

The rest of the things that are different between the domains would be handled once the crawl is finished, in a separate python script.

My question is, do I need to write a unique Scrapy crawler for each domain, or can I use one, and pass to it parameters for allowed_domains/start_urls.

I'm using Scraping Hub, and I may need to run the crawl on all my domains at the same time.

Jake 1986
  • 582
  • 1
  • 6
  • 25

0 Answers0