I have some html code from http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0 from my previous post How to set up XPath query for HTML parsing? and now want to create a logic process since many of the other pages are similar, but are not all the same. So with,
<div id="names">
<h2>Names and Synonyms</h2>
<div class="ds">
<button class="toggle1Col" title="Toggle display between 1 column of wider results and multiple columns.">↔</button>
<h3>Name of Substance</h3>
<ul>
<li id="ds2"><div>Acetaldehyde</div></li>
</ul>
<h3>MeSH Heading</h3>
<ul>
<li id="ds3"><div>Acetaldehyde</div></li>
</ul>
</div>
And now in my python script I would like to select the nodes "Name of Substance" and "MeSH Heading" and check if they exist and if so then select the data in them otherwise return an empty string. Is there a way to do so in python like in Javascript where I would use Node myNode = doc.DocumentNode.SelectNode(/[text()="Name Of Substance"/)?
from lxml import html
import requests
import csv
page = requests.get(http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0)
tree = html.fromstring(page.text)
if( Name of substance is there )
chem_name = tree.xpath('//*[text()="Name of Substance"]/..//div')[0].text_content()
else
chem_name = []
if ( MeSH Heading there )
mesh_name = tree.xpath('//*[text()="MeSH Heading"]/..//div')[1].text_content()
else
mesh_name = []
names1 = [chem_name, mesh_name]
with open('testchem.csv', 'wb') as myfile:
wr = csv.writer(myfile)
wr.writerow(names1)