How to Load more/show more pagination with scrapy-selenium

Question

Getting response but scraping nothing!

import scrapy
from scrapy.selector import Selector
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from time import sleep

class ProductSpider(scrapy.Spider):

    name = "card"

    start_urls = ['https://examplesite.com']

    def __init__(self):
        self.driver = webdriver.Chrome()

    def parse(self, response):
        self.driver.get(response.url)
        actions = ActionChains(self.driver)

        while True:
            next = self.driver.find_elements_by_css_selector("button#show-more")

            if next:
                last_height = self.driver.execute_script("return document.body.scrollHeight")
                self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
                actions.move_to_element(next[0]).click().perform()
                
                
                lists= Selector(text=self.driver.page_source)
                
                for list in lists.xpath('//ul[@id="finder-table"]/li'):
                    yield{
                        'Name': list.xpath('.//*[@class="table-item-heading-product-name"]/span/strong/text()').get(),
                        'Title': list.xpath('.//*[@class="table-item-heading-product-name"]/span/text()').get()
                    }

            else:
                break

        self.driver.close()

score 1 · Answer 1 · edited Jun 10 '22 at 15:53

1

I guess you need to scroll to the "show more" button before clicking on it since it is not on the visual area of the screen until you scroll the screen down.
Also, it's better to locate the element according to class name rather to it's text.
Also, in case there is no more "show more" buttons there your code will throw exception. So I used find_elements instead of what you wrote to get the elements list. This will not throw exception. In case no elements found it will return an empty list and your code will exit normally. In case element found you will use the first element in the returned list.
This is what I have finally re-building your code:


import scrapy

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from time import sleep

class ProductSpider(scrapy.Spider):

    name = "card"

   

    start_urls = ['https://examplesite.com']

    def __init__(self):
        self.driver = webdriver.Chrome()

    def parse(self, response):
        self.driver.get(response.url)
        actions = ActionChains(self.driver)

        while True:
            next =  driver.find_elements_by_css_selector("button#show-more")

            if next:
                last_height = driver.execute_script("return document.body.scrollHeight")
                driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
                actions.move_to_element(next[0]).click().perform()

                lists = self.driver.find_elements_by_xpath(
                    '//ul[@id="finder-table"]/li')
                for list in lists:
                    yield{
                        'Name': list.xpath('.//*[@class="table-item-heading-product-name"]/span/strong/text()').get(),
                        'Title': list.xpath('.//*[@class="table-item-heading-product-name"]/span/text()').get()
                    }

            else:
                break

        self.driver.close()

edited Jun 10 '22 at 15:53

Md. Fazlul Hoque

15,806
5
12
32

answered Jun 20 '21 at 12:09

Prophet

32,350
22
54
79

Thanks for beautiful explanation and sorry for that it's still not working due to an attribute error which is as follows: "function' object has no attribute 'perform"and also has some "self" mistakes. – Md. Fazlul Hoque Jun 20 '21 at 15:16
1

Try again what I updated there. `Actions` object have to be initialized with selenium webdriver object whicj is normally called `driver`. So this is what I passed there. At your configuration it seems to be used `self.driver` instead of `driver`. So this is what I changed now – Prophet Jun 20 '21 at 15:22
Hi thanks for a great response. I injected self with driver before and the trouble is not here. The main problem is 'perform()'. It's showing spider_exceptions: 'function' object has no attribute 'perform' – Md. Fazlul Hoque Jun 20 '21 at 17:02
1

Ah, sure. I already fixed that now. Forgot the `()` with click. See if it works now – Prophet Jun 20 '21 at 17:05
Hi Prophet. That was Selector problem and already solved. Here is the new exception:raise exception_class(message, screen, stacktrace) selenium.common.exceptions.MoveTargetOutOfBoundsException: Message: move target out of bounds – Md. Fazlul Hoque Jun 20 '21 at 17:51
I added some code there. now it should work – Prophet Jun 20 '21 at 17:58
I can't imagine how his error can occur in the current code. At least with my code. The rest your original code is on your responsibility – Prophet Jun 20 '21 at 21:20
I already edited code according to your instruction. – Md. Fazlul Hoque Jun 20 '21 at 21:22
Look, I can't debug your code on my computer. I just asked: what code line gives the error. You can't even answer to this simplest question. Without that I can't help you. – Prophet Jun 20 '21 at 21:26
The error trace is clearly mentions the code throwing the error BTW. It mentions not only the code line number rather the code itself, the method containing that code etc. – Prophet Jun 20 '21 at 21:27
That means that your locators are not unique enough and the whole code is not reliable enough. Well, that is not wondering... – Prophet Jun 21 '21 at 04:52

How to Load more/show more pagination with scrapy-selenium

1 Answers1