1

I am trying to get the whole data of this table. However, in the last row there is "Load More" table row that I do not know how to load. So far I have tried different approaches that did not work,

  1. I tried to click on the row itself by this:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')

table = soup.find('table', {"class": "competition-leaderboard__table"})

i = 0
for team in table.find.all('tbody'):
    rows = team.find_all('tr')
    for row in rows:
        i = i + 1
        if (i == 51):
            row.click()

        //the scraping code for the first 50 elements
        

The code above throws an error saying that "'NoneType' object is not callable".

Another thing that I have tried that did not work is the following: I tried to get the load more table row by its' class and click on it.

from selenium import webdriver
driver = webdriver.Chrome()
driver.get(url)

load_more = driver.find_element_by_class_name('competition-leaderboard__load-more-wrapper')
load_more.click()

soup = BeautifulSoup(driver.page_source, 'html.parser')

The code above also did not work.

So my question is how can I make python click on the "Load More" table row as in the HTML structure of the site it seems like "Load More" is not a button that is clickable.

grigyan
  • 71
  • 7

1 Answers1

0

In your code you have to accept cookies first, and then you can click 'Load more' button.

CSS selectors are the most suitable in this case.

import time

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.implicitly_wait(10)
driver.get('https://www.kaggle.com/c/coleridgeinitiative-show-us-the-data/leaderboard')

wait = WebDriverWait(driver, 30)
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".sc-pAyMl.dwWbEz .sc-AxiKw.kOAUSS>.sc-AxhCb.gsXzyw")))
cookies = driver.find_element_by_css_selector(".sc-pAyMl.dwWbEz .sc-AxiKw.kOAUSS>.sc-AxhCb.gsXzyw").click()
load_more = driver.find_element_by_css_selector(".competition-leaderboard__load-more-count").click()
time.sleep(10)  # Added for you to make sure that both buttons were clicked
driver.close()
driver.quit()

I tested this snippet and it clicked the desired button. Note that I've added WebDriverWait in order to wait until the first button is clickable.

UPDATE: I added time.sleep(10) so you could see that both buttons are clicked.

vitaliis
  • 4,082
  • 5
  • 18
  • 40
  • I tried to use your solution with my code and it does not seem to work, could you please take a look. – grigyan May 04 '21 at 13:08
  • I think there is an issue with the CSS selector part of the code. Could you please explain how they work as there are multiple components nested in one another? – grigyan May 04 '21 at 13:42
  • `.` is placed before class name, `>` is for direct child. Use `.class1.class2` when you need to input two classes names in one selector `.class1>.class2` : class2 is a direct child of a class 1. – vitaliis May 04 '21 at 15:15
  • I have tested it and it worked, I'll check a little later again. – vitaliis May 04 '21 at 15:16