Access Chrome DOM tree with python

Question

Using Chrome DevTools you can see the DOM tree of a page. Is there a way to access and pull out that tree using python?

Why should you use Python? You can use client side tools like javascript and jquery to access the dom. — voscausa, Sep 21 '12 at 13:48
@ voscausa -- I want to parse and analyze the dynamic content with python. — root, Sep 21 '12 at 14:13

score 5 · Accepted Answer · answered Sep 21 '12 at 15:35

5

The best way that I found was using selenium.webdriver :

import selenium.webdriver as webdriver
import lxml.html as lh
import lxml.html.clean as clean

browser = webdriver.Chrome() # Get local session of Chrome
browser.get("http://www.webpage.com") # Load page

content=browser.page_source
cleaner=clean.Cleaner()
content=cleaner.clean_html(content) 
doc=lh.fromstring(content)

doc gets the DOM as lxml.html.HtmlElement

answered Sep 21 '12 at 15:35

root

76,608
25
108
120

Great! Used in http://stackoverflow.com/questions/43183736/beautifulsoup-does-not-returns-all-data/43191283#43191283 – Bill Bell Apr 03 '17 at 17:57

score 2 · Answer 2 · answered Sep 21 '12 at 15:25

2

Have you used BeautifulSoup library? This section on the tutorial may answer your question. http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html#The Parse Tree

Then, you also need to import Requests library.

from BeautifulSoup import BeautifulSoup
import requests
url = 'http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html'
page = requests.get(url)
soup = BeautifulSoup(page.content)
print soup

answered Sep 21 '12 at 15:25

msunbot

1,871
4
16
16

@ michellesun -- thank you for your answer, but unfortunately this only gets me the html without the dynamic content delivered by javascript. – root Sep 21 '12 at 15:37
@root: if dynamically-generated content is critical, you should add that requirement to your question. Be sure to describe when and how it is generated - if it requires user interaction in order to appear, that expands the scope of this considerably. – Shog9 Sep 21 '12 at 18:19
@ Shog9 -- the question was about accessing Chrome dom tree. I did not feel the need to specify my reasons for doing so, as it is specific enough. Accessing html page with BS is somewhat different that I asked for. Besides, dynamic content was mentioned in the comments. – root Sep 21 '12 at 22:38

Access Chrome DOM tree with python

2 Answers2

Linked