How to select nodes in html from lxml?

Question

I have some html code from http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0 from my previous post How to set up XPath query for HTML parsing? and now want to create a logic process since many of the other pages are similar, but are not all the same. So with,

<div id="names">
<h2>Names and Synonyms</h2>
<div class="ds">
<button class="toggle1Col" title="Toggle display between 1 column of wider results and multiple columns.">&#8596;</button>
<h3>Name of Substance</h3>
<ul>
<li id="ds2"><div>Acetaldehyde</div></li>
</ul>
<h3>MeSH Heading</h3>
<ul>
<li id="ds3"><div>Acetaldehyde</div></li>
</ul>
</div>

And now in my python script I would like to select the nodes "Name of Substance" and "MeSH Heading" and check if they exist and if so then select the data in them otherwise return an empty string. Is there a way to do so in python like in Javascript where I would use Node myNode = doc.DocumentNode.SelectNode(/[text()="Name Of Substance"/)?

from lxml import html
import requests 
import csv 
page = requests.get(http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0)
tree = html.fromstring(page.text) 

if( Name of substance is there )
    chem_name = tree.xpath('//*[text()="Name of Substance"]/..//div')[0].text_content()
else
    chem_name = [] 
if ( MeSH Heading there )
    mesh_name = tree.xpath('//*[text()="MeSH Heading"]/..//div')[1].text_content()
else 
    mesh_name = []

names1 = [chem_name, mesh_name]
with open('testchem.csv', 'wb') as myfile:
    wr = csv.writer(myfile) 
    wr.writerow(names1)

score 0 · Accepted Answer · answered Jun 29 '15 at 14:43

You can just simply check if Name of Substance or MeSH Heading Are in the text of the webpage, and if they are then select the contents.

from lxml import html
import requests
import csv
page = requests.get('http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0')
tree = html.fromstring(page.text)

if ("Name of Substance" in page.text):
    chem_name = tree.xpath('//*[text()="Name of Substance"]/..//div')[0].text_content()
else:
    chem_name = ""

if ("MeSH Heading" in page.text):
    mesh_name = tree.xpath('//*[text()="MeSH Heading"]/..//div')[1].text_content()
else:
    mesh_name = ""

names1 = [chem_name, mesh_name]
with open('testchem.csv', 'wb') as myfile:
    wr = csv.writer(myfile)
    wr.writerow(names1)

Thanks, it worked! I'm a complete novice at programming in general and simple things like this are slowly coming together. — TimTom, Jun 29 '15 at 15:58

How to select nodes in html from lxml?

1 Answers1