I'm scraping news.crunchbase.com with Scrapy. The callback function on following recursive links doesn't fire in case if I follow the actual link encountered, but it works fine if I crawl some test link instead. I assume that the problem is in timing, hence want to delay the recursive request.
EDIT: answer from here does set the global delay, but it doesn't adjust the recursive delay. Recursive link crawl is done instantly - just as soon as data has been scraped.
def parse(self, response):
time.sleep(5)
for post in response.css('div.herald-posts'):
article_url = post.css('div.herald-post-thumbnail a::attr(href)').get()
if article_url is not None:
print('\nGot article...', article_url, '\n')
yield response.follow(article_url, headers = self.custom_headers, callback = self.parse_article)
yield {
'title': post.css('div.herald-post-thumbnail a::attr(title)').get(),
}