I am running a batch of 500 crawl jobs on scrapyd fired from a shell script. I am having this issue locally on mac as well as on ec2 instance. These crawl jobs have been working fine with a batch of 100 however when i run it for 500 it throws "sqlite3.OperationalError: unable to open database file" exception after about 300.
Note: Each crawl(one spider) is a project and is deployed on scrapyd, which means it would have 500 projects deployed.
After about 300 crawls are done I start seeing this exception and cannot deploy anymore projects. If I restart the scrapyd server it will not restart again, throws the same exception.
Only way i can get to start again and crawl again is by
- stopping the server
- rm -rf dbs files
- rm -rf eggs (probably not required)
- rm -rf logs (probably not required)
start server
Any ideas why this would happen? Here is the exception
2017-04-13T23:28:57+0000 [stdout#info] 1 2017-04-13T23:28:57+0000 [stdout#info] Traceback (most recent call last): 2017-04-13T23:28:57+0000 [stdout#info] File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main 2017-04-13T23:28:57+0000 [stdout#info] File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code 2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/lib/python2.7/site-packages/scrapyd/runner.py", line 39, in <module> 2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/lib/python2.7/site-packages/scrapyd/runner.py", line 34, in main 2017-04-13T23:28:57+0000 [stdout#info] File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ 2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/lib/python2.7/site-packages/scrapyd/runner.py", line 13, in project_environment 2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/__init__.py", line 14, in get_application 2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/app.py", line 37, in application 2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/website.py", line 35, in __init__ 2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/website.py", line 38, in update_projects 2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/poller.py", line 30, in update_projects 2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/utils.py", line 61, in get_spider_queues 2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/spiderqueue.py", line 12, in __init__ 2017-04-13T23:28:57+0000 [stdout#info] File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/sqlite.py", line 98, in __init__ 2017-04-13T23:28:57+0000 [stdout#info] sqlite3.OperationalError: unable to open database file 2017-04-13T23:28:57+0000 [_GenericHTTPChannelProtocol,673,10.0.3.119] Unhandled Error Traceback (most recent call last): File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/http.py", line 1845, in allContentReceived req.requestReceived(command, path, version) File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/http.py", line 766, in requestReceived self.process() File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/server.py", line 190, in process self.render(resrc) File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/server.py", line 241, in render body = resrc.render(self) --- <exception caught here> --- File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/webservice.py", line 17, in render return JsonResource.render(self, txrequest) File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/utils.py", line 19, in render r = resource.Resource.render(self, txrequest) File "/home/ec2-user/scrapyENV/local/lib64/python2.7/site-packages/twisted/web/resource.py", line 250, in render return m(request) File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/webservice.py", line 68, in render_POST spiders = get_spider_list(project) File "/home/ec2-user/scrapyENV/local/lib/python2.7/site-packages/scrapyd/utils.py", line 116, in get_spider_list raise RuntimeError(msg.splitlines()[-1]) exceptions.RuntimeError: sqlite3.OperationalError: unable to open database file
I am guessing scrapyd after 300 projects is running out of space which is why popen fails but looks like the box has some space. Any pointers will be helpful.
I am running scrapyd 1.3.3 on ec2 instance with default config and python 2.7.
Doing lsof on the dbs folder shows me two entries for each .db file. Is this expected?
scrapyd 6363 ec2-user 1005u REG 202,1 2048 148444 /home/ec2-user/scrapyENV/bin/dbs/LatamPtBlogGenesysCom.db scrapyd 6363 ec2-user 1006u REG 202,1 2048 148444 /home/ec2-user/scrapyENV/bin/dbs/LatamPtBlogGenesysCom.db scrapyd 6363 ec2-user 1007u REG 202,1 2048 148503 /home/ec2-user/scrapyENV/bin/dbs/WwwPeeblesshirenewsCom.db scrapyd 6363 ec2-user 1009u REG 202,1 2048 148503 /home/ec2-user/scrapyENV/bin/dbs/WwwPeeblesshirenewsCom.db