1

When trying to scrape the county data from multiple Politico state web pages, such as this one, I concluded the best method was to first click the button that expands the county list before grabbing the table body's data (when present). However, my attempt at clicking the button had failed:

from bs4 import BeautifulSoup as bs
import requests
from splinter import Browser

state_page_url = "https://www.politico.com/2020-election/results/washington/"
executable_path = {'executable_path': 'chrome-driver/chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)
browser.visit(state_page_url)
    
state_soup = bs(browser.html, 'html.parser')
reveal_button = state_soup.find('button', class_='jsx-3713440361')

if (reveal_button == None):
    # Steps to take when the button isn't present
    # ...
else:
    reveal_button.click()

The error returned when following the else-condition is for my click() call: "TypeError: NoneType object is not callable". This doesn't make sense to me since I thought that the if-statement implied the reveal_button was not a NoneType. Am I misinterpeting the error message, how the reveal_button was set or am I misinterpeting what I'm working with after making state_soup?

Joe
  • 25
  • 7
  • 1
    `state_county_reveal_button != reveal_button` though I would have expected an error regarding state_county_reveal_button not being declared based on above code so maybe you are using jupyter notebook or some such that remembers prior naming in session. – QHarr Jan 14 '21 at 06:02
  • You are correct about running it in Jupyter Notebook. I corrected the code above so that pitfall wasn't present and ran the code again in Jupyter Notebook. The error still apears for the `click()` call but at least now I know its not a mismatched variable. – Joe Jan 16 '21 at 20:28
  • Are you sure the error is coming from the else statement? It might be something related to the code in if part. if the problem was from the else part it should have been something like "NoneType does not have any method named click" – nCoder Jan 16 '21 at 20:37
  • This _jsx-3713440361_ looks dynamic. Have you confirmed it is present in the launched browser? Also, can you change test to `if reveal_button is None:` – QHarr Jan 16 '21 at 21:05
  • @nCoder The error message had pointed to the `click()` line in the else-statement, and I had included a print message before the `click()` to confirm it. – Joe Jan 16 '21 at 22:58
  • 1
    @QHarr I had reloaded the web page across multiple states on multiple dates and the class _jsx-3713440361_ was consistently the button's class. I switched from `== None` to `is None` and got the same error message. – Joe Jan 16 '21 at 23:01
  • 1
    Side note: Why are you trying to .click on a soup tag? It is not a method of tag object (which is static) – QHarr Jan 17 '21 at 06:00
  • @QHarr The intent was to click the button within the browser so that the county table would expand to show all county results (which is what I was after). It sounds like I misunderstood what the `state_soup.find()` is doing. I thought it could interact directly with the page, but based on the [documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) and my code it looks like I was trying to `clcik()` a `NavigableString` object. Based on [this discussion](https://stackoverflow.com/questions/23679480/clicking-link-using-beautifulsoup-in-python) it sounds like I'll need Selenium – Joe Jan 19 '21 at 01:58
  • Or an equivalent tool with Python – Joe Jan 19 '21 at 02:10

1 Answers1

0

Based on the comment thread for the question, and this solution to a similar question, I came across the following fix:

from bs4 import BeautifulSoup as bs
import requests
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By

# Navigate the page to click the desired button
state_page_url = "https://www.politico.com/2020-election/results/alabama/"
driver = webdriver.Chrome(executable_path='chrome-driver/chromedriver.exe')
driver.get(state_page_url)
button_list = driver.find_elements(By.CLASS_NAME, 'jsx-3713440361')

if button_list == []:
    # Actions to take when no button is found
    # ...
else:
    button_list[-1].click() # The index was determined through trial/error specific to the web page

    # Now to grab the table and its data
    state_soup = bs(driver.page_source)
    state_county_results_table = state_soup.find('tbody', class_='jsx-3713440361')

Note that it required selenium for navigation and interaction while BeautifulSoup4 was used to parse it for the information I'd need

Joe
  • 25
  • 7