I want to know what is going on while scrapy is running , how can i monitor the status ?
Asked
Active
Viewed 2,066 times
2 Answers
3
There are two methods. The first method, here is a example from the official document.
telnet localhost 6023
>>> est()
Execution engine status
time()-engine.start_time : 8.62972998619
engine.has_capacity() : False
len(engine.downloader.active) : 16
engine.scraper.is_idle() : False
engine.spider.name : followall
engine.spider_is_idle(engine.spider) : False
engine.slot.closing : False
len(engine.slot.inprogress) : 16
len(engine.slot.scheduler.dqs or []) : 0
len(engine.slot.scheduler.mqs) : 92
len(engine.scraper.slot.queue) : 0
len(engine.scraper.slot.active) : 0
engine.scraper.slot.active_size : 0
engine.scraper.slot.itemproc_size : 0
engine.scraper.slot.needs_backout() : False
For more information, please refer to the official document.
The second method seems a little simpler. You can get get the status of crawler through the following method:
self.crawler.stats.get_stats()
or
spider.crawler.stats.get_stats()
So, just print out the status as you like.

zczhuohuo
- 169
- 1
- 13
-
thanks. There is "web service" from the official document , do you know how to enable it ? – Spy Oct 14 '14 at 07:08
-
It is enabled by default. But if you are not sure, you can explicitly enable by setting WEBSERVICE_ENABLED = True – zczhuohuo Oct 14 '14 at 08:46
0
There is also a third party extension to monitor the scrapy status
This project was built in scrapy once, now it's independent. scrapy-jsonrc control and monitor a running Scrapy web crawler via JSON-RPC and has a web service.
web-service in scrapy documents of old version.
This project use a StatcollectorMiddleware to store stats of current requests to redis. And have a web service as well.

J. Fan
- 731
- 1
- 7
- 9