Scrapy is not Crawling the next page url

Question

My spider is not crawling the page 2 but the XPath is returning the correct next page link which is an absolute link to next page.

Here is my code

from scrapy import Spider
from scrapy.http import Request, FormRequest



class MintSpiderSpider(Spider):

    name = 'Mint_spider'
    allowed_domains = ['example.com']
    start_urls = ['http://www.example.com/']

    def parse(self, response):
        urls =  response.xpath('//div[@class = "post-inner post-hover"]/h2/a/@href').extract()

        for url in urls:
            yield Request(url, callback=self.parse_lyrics)

        next_page_url = response.xpath('//li[@class="next right"]/a/@href').extract_first()
        if next_page_url:
            yield scrapy.Request(next_page_url, callback=self.parse)


    def parse_foo(self, response):
        info = response.xpath('//*[@class="songinfo"]/p/text()').extract()
        name =  response.xpath('//*[@id="lyric"]/h2/text()').extract()

        yield{
            'name' : name,
            'info': info
        }

Possible duplicate of [Scrapy: Following pagination link to scrape data](https://stackoverflow.com/questions/52246009/scrapy-following-pagination-link-to-scrape-data) — Joaquin, Sep 25 '18 at 14:57
Actually, the indentation was right I accidentally posted it in two parts now it's fine you can check sir. — Abhijeet Pal, Sep 25 '18 at 15:10

Adrien Blanquer · Accepted Answer · 2018-09-25T16:36:49.453

5

The problem is that next_page_url is a list, and it needs to be an url as a string. You need to use the extract_first() function instead of extract() in next_page_url = response.xpath('//li[@class="next right"]/a/@href').extract().

UPDATE

You have to import scrapy since you are using yield scrapy.Request(next_page_url, callback=self.parse)

edited Sep 25 '18 at 16:36

answered Sep 25 '18 at 15:23

Adrien Blanquer

2,041
1
19
31

Thanks sir and this ` next_page_url = response.xpath('//li[@class="next right"]/a/@href').extract() if next_page_url: yield scrapy.Request(next_page_url, callback=self.parse)` should be out of the for loop right? – Abhijeet Pal Sep 25 '18 at 15:45
Yes, if not, you will request the `next_page_url` every time you request one of page's url. – Adrien Blanquer Sep 25 '18 at 15:48
Sir, I Indedntend the code the way you told but in the Next_page_url line it is showing indentation error but everything looks perfect here it's out of the loop too. – Abhijeet Pal Sep 25 '18 at 16:05
I edited your post with correct indent, copy paste it, and double check the indent – Adrien Blanquer Sep 25 '18 at 16:08
Dear sir, I did exactly what you said but scrappy only crawled the first page and then stopped – Abhijeet Pal Sep 25 '18 at 16:24
Have you checked that the xpath is correct ? That you correctly get an url ? What is the website you are scraping ? – Adrien Blanquer Sep 25 '18 at 16:26
Thanks a ton sir you are a true legend – Abhijeet Pal Sep 25 '18 at 16:44

Scrapy is not Crawling the next page url

1 Answers1

Linked