0

I need to scrape data from a website, there is a hidden div not showing until you click a button in the website. when I use code to get html content, I cannot get the hidden div content even if I can see the hidden div data in "Inspect"

Details of url, code and hidden DIV are as below:

import requests
import bs4

url = 'https://so.gushiwen.org/guwen/bookv_3694.aspx'
doc=requests.get(url)
print(bs4.BeautifulSoup(doc.text, "html.parser"))

enter image description here

Dharman
  • 30,962
  • 25
  • 85
  • 135
Ling
  • 349
  • 5
  • 15
  • 1
    IF you know 'selenium' or 'puppeteer' you can click the button and trigger hidden attribute so that you can get that div tag. – Tserenjamts Nov 05 '19 at 07:16
  • Does this answer your question? [Scraping hidden elements using BeautifulSoup](https://stackoverflow.com/questions/34546766/scraping-hidden-elements-using-beautifulsoup) – MikeMajara Nov 05 '19 at 07:51

1 Answers1

0

You can use selenium to locate the desired div by id and use soup.send_keys('\n'):

from selenium import webdriver
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://so.gushiwen.org/guwen/bookv_3694.aspx')
d.find_element_by_id('right2321').send_keys('\n')

Now, you can use BeautifulSoup to scrape your desired content via:

from bs4 import BeautifulSoup as soup
content = soup(d.page_source, 'html.parser').find('div', {'id':'right2321'}).text
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • run the above script, got error ``` ElementNotInteractableException: Message: element not interactable ``` – Ling Nov 06 '19 at 02:27
  • I work it out: ``` d.get('https://so.gushiwen.org/guwen/bookv_966.aspx') element = d.find_element_by_id('leftbtn784') element.click() hid = d.find_element_by_id('right784') print(hid.text) ``` – Ling Nov 06 '19 at 11:55