Scrapy: settings, multiple concurrent spiders, and middlewares

Question

I'm used to running spiders one at a time, because we mostly work with scrapy crawl and on scrapinghub, but I know that one can run multiple spiders concurrently, and I have seen that middlewares often have a spider parameter in their callbacks.

What I'd like to understand is:

the relationship between Crawler and Spider. If I run one spider at a time, I'm assuming there's one of each. But if you run more spiders together, like in the example linked above, do you have one crawler for multiple spiders, or are they still 1:1?
is there in any case only one instance of a middleware of a certain class, or do we get one per-spider or per-crawler?
Assuming there's one, what are the crawler.settings in the middleware creation (for example, here)? In the documentation it says that those take into account the settings overridden in the spider, but if there are multiple spiders with conflicting settings, what happens?

I'm asking because I'd like to know how to handle spider-specific settings. Take again the DeltaFetch middleware as an example:

enabling it seems to be a global matter, because DELTAFETCH_ENABLED is read from the crawler.settings
however, the sqlite db is opened in spider_opened and is a unique instance variable (i.e., not depending on the spider); so if you have more than one spider and the instance is shared, when the second spider is opened, the old db is lost. And if you have only one instance of the middleware per spider, why bother passing the spider as a parameter?

Is that a correct way of handling it, or should you rather have a dict spider_dbs indexed by spider name?

Scrapy: settings, multiple concurrent spiders, and middlewares

0 Answers0