So as the title states I have some HTML code from http://chem.sis.nlm.nih.gov/chemidplus/name/acetone that I am parsing and want to extract some data like the Acetone under MeSH Heading from my similar post How to set up XPath query for HTML parsing?
<div id="names">
<h2>Names and Synonyms</h2>
<div class="ds">
<button class="toggle1Col" title="Toggle display between 1 column of wider results and multiple columns.">↔</button>
<h3>Name of Substance</h3>
<div class="yui3-g-r">
<div class="yui3-u-1-4">
<ul>
<li id="ds2">
<div>2-Propanone</div>
</li>
</ul>
</div>
<div class="yui3-u-1-4">
<ul>
<li id="ds3">
<div>Acetone</div>
</li>
</ul>
</div>
<div class="yui3-u-1-4">
<ul>
<li id="ds4">
<div>Acetone [NF]</div>
</li>
</ul>
</div>
<div class="yui3-u-1-4">
<ul>
<li id="ds5">
<div>Dimethyl ketone</div>
</li>
</ul>
</div>
</div>
<h3>MeSH Heading</h3>
<ul>
<li id="ds6">
<div>Acetone</div>
</li>
</ul>
</div>
</div>
Previously in other pages I would do mesh_name = tree.xpath('//*[text()="MeSH Heading"]/..//div')[1].text_content()
to extract the data because other pages had similar structures, but now I see that is not the case as I didn't account for inconsistency. So, is there a way of after going to the node that I want and then obtaining it's subchild, allowing for consistency across different pages?
Would doing tree.xpath('//*[text()="MeSH Heading"]//preceding-sibling::text()[1]')
work?