Here is some HTML code from http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0 in Google Chrome that I want to parse the website for some project.
<div id="names">
<h2>Names and Synonyms</h2>
<div class="ds"><button class="toggle1Col"title="Toggle display between 1 column of wider results and multiple columns.">↔</button>
<h3 id="yui_3_18_1_3_1434394159641_407">Name of Substance</h3>
<ul>
<li id="ds2">
`` <div>Acetaldehyde</div>
</li>
</ul>
</div>
I wrote a python script to help me do such a thing by grabbing the name under one of the sections, but it just isn't returning the name. I think it's my xpath query, suggestions?
from lxml import html
import requests
import csv
names1 = []
page = requests.get('http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0')
tree = html.fromstring(page.text)
//This will grab the name data
names = tree.xpath('//*[@id="yui_3_18_1_3_1434380225687_700"]')
//Print the name data
print 'Names: ', names
//Convert the data into a string
names1.append(names)
//Print the bit length
print len(names1)
//Write it to csv
b = open('testchem.csv', 'wb')
a = csv.writer(b)
a.writerows(names1)
b.close()
print "The end"