I'v created a script in scrapy to parse the titles of different sites listed in start_urls
. The script is doing it's job flawlessly.
What I wish to do now is let my script stop after two of the urls are parsed no matter how many urls are there.
I've tried so far with:
import scrapy
from scrapy.crawler import CrawlerProcess
class TitleSpider(scrapy.Spider):
name = "title_bot"
start_urls = ["https://www.google.com/","https://www.yahoo.com/","https://www.bing.com/"]
def parse(self, response):
yield {'title':response.css('title::text').get()}
if __name__ == "__main__":
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0',
})
c.crawl(TitleSpider)
c.start()
How can I make my script stop when two of the listed urls are scraped?