I have this simply code:
import scrapy
import re
import json
# from scrapy.http import FormRequest
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class SpiderRecipe(CrawlSpider):
name = "recipe"
start_urls = [
# 'https://www.giallozafferano.it/',
'https://ricetta.it/dolci?page=1',
# 'https://www.buonissimo.it/',
# 'https://migusto.migros.ch/it.html'
]
def parse(self,response):
URL = response.request.url()
if URL.split('/')[2] == "www.ricetta.it":
recipes = response.xpath('//div[contains(@class,"row")]/div[contains(@class,"post-img-left")]').extract()
# iterate through each recipe in a page
for x in recipes.extract():
title = response.xpath(recipes + '/a[contains(@class, "post-title")]/text()').extract()[x]
image = response.xpath(recipes + '/div[contains(@class,"videoContainer")]/img/@src').extract()[x]
description = response.xpath(recipes + '/p[contains(@class,"post-excerpt")]/text()').extract()[x]
yield {
'Title': title,
'Image': image,
'Description': description,
}
page = int(URL.split('=')[1]) + 1;
if (page <= 148):
# iterate through each page of recipes
yield scrapy.Request(URL.split('=')[0] + str(page), callback=self.parse, dont_filter=True)
It is called by the terminal using scrapy runspider recipe.py -o output.json.
The first part of the codw works, because it can take the starting URL, but I don't understand why the parse function is not called, also if the code isn't correct I tried to print at the beginning of the function a string but it didn't work. I tried to check for solutions, but my function is inside the class and I have correctly inserted the url from where we have to start (the link is correct). Maybe it is something very easy but I cannot find it. I also read that the function must be called but in the examples no one does it, and in addition I continuously call it at the end of the code.