I did encounter some strange behaviour (to my perspective of knowledge) of SplashRequest's callback when it's executed by scrapyd.
Scrapy Sourcecode
from scrapy.spiders.Spider import Spider
from scrapy import Request
import scrapy
from scrapy_splash import SplashRequest
class SiteSaveSpider(Spider):
def __init__(self, domain='', *args, **kwargs):
super(SiteSaveSpider, self).__init__(*args, **kwargs)
self.start_urls = [domain]
self.allowed_domains = [domain]
name = "sitesavespider"
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url, callback=self.parse, args={'wait':0.5})
print "TEST after yield"
def parse(self, response):
print "TEST in parse"
with open('/some_path/test.html', 'w') as f:
for line in response.body:
f.write(line)
Log for internal Scrapy Crawler
The callback parse function is executed when started by
scrapy crawl sitesavespider -a domain="https://www.facebook.com"
...
2017-01-29 14:12:37 [scrapy.core.engine] INFO: Spider opened
2017-01-29 14:12:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
TEST after yield
2017-01-29 14:12:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.facebook.com via http://127.0.0.1:8050/render.html> (referer: None)
TEST in parse
2017-01-29 14:12:55 [scrapy.core.engine] INFO: Closing spider (finished)
...
Log for scrapyd
When starting the same spider by using scrapyd it will return directly after the SplashRequest:
>>>scrapyd.schedule("feedbot","sitesavespider",domain="https://www.facebook.com")
u'f2f4e090e62d11e69da1342387f8a0c9'
cat f2f4e090e62d11e69da1342387f8a0c9.log
...
2017-01-29 14:19:34 [scrapy.core.engine] INFO: Spider opened
2017-01-29 14:19:34 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-01-29 14:19:58 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.facebook.com via http://127.0.0.1:8050/render.html> (referer: None)
2017-01-29 14:19:58 [scrapy.core.engine] INFO: Closing spider (finished)
...
Is somebody aware of this problem or could help me to find my mistakes?