I am new to Scrapy and am trying to play around with the framework. What is really frustrating is that when I run "scrapy crawl (name of spider)" it runs every single spider in my "spiders" folder. So I either have to wait out all of the spiders running or comment out all the spiders except for the one I am working with. It is very annoying. How can I make it so that scrapy only runs one spider at a time?
-
did you give all your spiders the same name (the one defined in `SpiderClass.name`)? Calling `scrapy crawl spider_name` shouldn't run multiple spiders at once. – Kamoo Oct 28 '19 at 03:09
-
No that's why it is strange. They are all named differently. I checked my settings file and tried to see if I had set something up the wrong way but I don't see anything strange. – Tom H Oct 31 '19 at 01:24
2 Answers
You can run scrapy from your script (https://scrapy.readthedocs.io/en/latest/topics/practices.html#run-from-script), for example:
import scrapy
from scrapy.crawler import CrawlerProcess
class YourSpider(scrapy.Spider):
# Your spider definition
process = CrawlerProcess()
process.crawl(YourSpider)
process.start()

- 1,505
- 3
- 11
- 22
It shouldn't be running the entire spider, though it does compile and run through some stuff, as that's how it pulls the spider names (I assume there are other reasons, otherwise it seems like an odd way to set things up). If you post your spider we can see what might be running vs. not.
I had the same issue, as my spiders modified csv files, including renaming/deleting them, which was screwing things up when I only wanted to run a specific spider. My solution was to have the spiders do certain tasks only when they were actually run or closed. Documentation here: https://docs.scrapy.org/en/latest/topics/signals.html though I found it lacking. Here is the code I used. from_crawler section can be left alone aside from changing the spider name. Put whatever you'd like in the spider_closed portion
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super(SixPMSpider, cls).from_crawler(crawler, *args, **kwargs)
crawler.signals.connect(spider.spider_closed, signal=signals.spider_closed)
return spider
def spider_closed(self, spider):
os.remove(self.name+'_price_list.csv')
os.rename(self.name+'_price_list2.csv', self.name+'_price_list.csv')

- 160
- 7
-
So the thing is I have a number of spiders. I am not even sure which one I would need to show since the error is happening with the entire folder. Some of the spiders are written differently than others but as far as I can see they aren't connected. I am knee deep in another project right now but will post some of my code soon and maybe you will see something I'm missing. – Tom H Oct 31 '19 at 01:26