Python Scrapy start Spider again when Closed

Question

how can i do a scrapy spider that runs "forever".

So i will start again when it gets def closed(self, spider):

This is the function that calls when its at the end. I tested it with print a text. So everytime at the end, i have that text.

But how can i start the spider than again?

class Spider(scrapy.Spider):
    def start_requests(self):
        Spidercode...

    def closed(self, spider):
        print('END');

The spider start every round with "start_requests" and ends with closed()

Let's see your code for creating and starting the spider. I assume you'll want to put that code in a loop somehow. — CryptoFool, Jan 02 '21 at 16:13
@Steve my idea was to jump from closed to start_request again? — togmer, Jan 02 '21 at 16:17
why don't simply call `start_requests` in your `closed`. Maybe before you would also like to reset your object state if exists — Lior Cohen, Jan 02 '21 at 16:17
Does this answer your question? [How to build a web crawler based on Scrapy to run forever?](https://stackoverflow.com/questions/2350049/how-to-build-a-web-crawler-based-on-scrapy-to-run-forever) — Gallaecio, Feb 22 '21 at 03:40

score 1 · Answer 1 · answered Jan 02 '21 at 18:43

import scrapy
from scrapy.crawler import CrawlerProcess
from twisted.internet import reactor
from twisted.internet.task import deferLater
...
runner=CrawlerProcess(setting={})
def sleep(self, *args, seconds):
    """Non blocking sleep callback"""
    return deferLater(reactor, seconds, lambda: None
def crawl(result):
    d=runner.crawl(MySpider)
    d.addCallback(lambda results: print('waiting 0 seconds before restart...'))
    d.addErrback(crash)  # <-- add errback here
    d.addCallback(sleep, seconds=0) # call back in second
    d.addCallback(crawl)    
    return d  
crawl(None)
runner.start()

Python Scrapy start Spider again when Closed

1 Answers1