0

I seem to be struggling with this issue for a couple of days and could really use some help. I am trying to scrape Google busineses information with Python beautifulsoups and Selenium and I want to extract the business description that is available for some of them: enter image description here

As you can see not all of the text is shown so I need to click “More”. That is where the problem comes, no matter what I do I can’t seem to click it. I tried:

  • Waiting after I get url with Selenium so elements load
  • Getting element by class
  • Getting element by xpath
  • Clicking element via js executed code
  • Checking if element is in iframe(seems like it is not)
  • Setting browser to max size, setting browser headless option on and of
  • Switching between Firefox and Chrome

EDIT: Code I tried using:

url = 'https://www.google.com/search?q=' + quote(''.join(company) + ' ' + ''.join(location))
    chrome_options = webdriver.FirefoxOptions()
    chrome_options.headless = True
    chrome_options.add_argument("--lang=en-GB")
    chrome_options.add_argument("--window-size=1100,1000")
    chrome_options.add_argument('--user-agent="Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Lumia 640 XL LTE) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Mobile Safari/537.36 Edge/12.10166"')
    browser = webdriver.Firefox(executable_path='C:/geckodriver.exe', options=chrome_options)
    from selenium.webdriver.support import expected_conditions as EC
    browser.maximize_window()
    wait = WebDriverWait(browser, 10)
    browser.get(url)  # open a new tab in the new window
    wait.until(EC.element_to_be_clickable((By.XPATH, "//a[@href and .='More']"))).click()
    # browser.find_element_by_class_name('bJpcZ').click()
    html = browser.page_source
    browser.close()
    soup = BeautifulSoup(html, 'lxml')
    

If anyone feels like he/she knows a solution please pass it over :)

Thresh Bot
  • 41
  • 5

1 Answers1

0
driver.maximize_window()
wait=WebDriverWait(driver,10)
driver.get("https://www.google.com/search?rlz=1C1NDCM_enCA792CA792&sxsrf=APq-WBsY3Q1E1ge_7PuFaovaxQ_Orvk8-w:1645162032562&q=dungeness+pest+control&spell=1&sa=X&ved=2ahUKEwjfuLGUwoj2AhUaHDQIHUtpCOkQBSgAegQIARAy&biw=1366&bih=663&dpr=1")  # open a new tab in the new window
wait.until(EC.element_to_be_clickable((By.XPATH,"//a[@href and .='More']"))).click()

Simply click the a tag with the text more.

There is a //div[@data-long-text] however where you could just .get_attribute("data-long-text") instead.

Import:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
Arundeep Chohan
  • 9,779
  • 5
  • 15
  • 32
  • I added the code, to the question, I get selenium.common.exceptions.TimeoutException: Message: Stacktrace: WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:183:5 NoSuchElementError@chrome://remote/content/shared/webdriver/Errors.jsm:395:5 element.find/<@chrome://remote/content/marionette/element.js:300:16 when I try to use this – Thresh Bot Feb 18 '22 at 18:26
  • It's most likely the headless setting. – Arundeep Chohan Feb 19 '22 at 23:08
  • Ahhh, I figured it out, there was an cookie agree popup that was appearing and was causing trouble! All I had to do is agree to the terms – Thresh Bot Feb 20 '22 at 00:27