My scrapyd is handling 200 spiders at once daily . Yesterday, the server crashed because RAM hit its cap.
I am using scrapyd default setting
[scrapyd]
http_port = 6800
debug = off
#max_proc = 1
eggs_dir = /var/lib/scrapyd/eggs
dbs_dir = /var/lib/scrapyd/dbs
items_dir = /var/lib/scrapyd/items
logs_dir = /var/log/scrapyd
Here is code to schedule all spiders:
url = 'http://localhost:6800/schedule.json'
crawler = self.crawler_process.create_crawler()
crawler.spiders.list()
for s in crawler.spiders.list():
values = {'project' : 'myproject', 'spider' : s}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
how to optimize scrapyd setting to handle 200+ spiders ?
Thanks