0

I have an HTML doc, and am trying to extract hrefs for certain table rows. Rendered, the table has several columns. Every name is a hyperlink, but I am trying to capture hyperlinks for names that have a blank value in a the fourth column. That is because the fourth column contains the termination date, and I am interested only in active (non-terminated) employees.

Here is a subset of the HTML response:

<tr id="r6" >                                    
<td>
   <a href="benefits.asp?SK=177646822&STYPE=ELNAME&QRY=a">111-11-1111</a>
</td>
<td >
   <a href="benefits.asp?SK=177646822&STYPE=ELNAME&QRY=a">Lastname</a>
</td>
<td nowrap="nowrap" >
   <a href="benefits.asp?SK=177646822&STYPE=ELNAME&QRY=a">Firstname</a>
</td>
<td nowrap="nowrap" >
   &nbsp;
</td>
<td>
   <a href="benefits.asp?SK=177646822&STYPE=ELNAME&QRY=a">743</a>
</td>
</tr>

<tr id="r7" >                                    
<td>
   <a href="benefits.asp?SK=177646782&STYPE=ELNAME&QRY=a">222-22-2222</a>
</td>
<td >
   <a href="benefits.asp?SK=177646782&STYPE=ELNAME&QRY=a">Ignore</a>
</td>
<td nowrap="nowrap" >
   <a href="benefits.asp?SK=177646782&STYPE=ELNAME&QRY=a">This</a>
</td>
<td nowrap="nowrap" >
   <a href="benefits.asp?SK=177646782&STYPE=ELNAME&QRY=a">7/12/2010</a>
</td>
<td>
   <a href="benefits.asp?SK=177646782&STYPE=ELNAME&QRY=a">1070</a>
</td>
</tr>

The first table row above (id=r6) is missing a date field in column 4, which is present in the second one. So I am trying to extract the href of the first but not the second. In other words, "give me the first href of each table row which has &nbsp; in column 4."

This in FirePath gives me all the hrefs in the table: //table[@id="searchResults"]//@href

Thanks

1 Answers1

0

The answer depends on the environment you are using XPath in. Specifically, it depends on how non-breaking spaces are encoded. In XSLT, for example, the expression would look like

//tr[contains(td[4],'&#160;')]/td[1]/a/@href

Input (slightly modified)

<?xml version="1.0"?>
<!DOCTYPE root [
    <!ENTITY nbsp "&#160;">
]>
<root>
<tr id="r6" >                                    
<td>
   <a href="YES">111-11-1111</a>
</td>
<td >
   <a href="benefits.asp?SK=177646822STYPE=ELNAMEQRY=a">Lastname</a>
</td>
<td nowrap="nowrap" >
   <a href="benefits.asp?SK=177646822STYPE=ELNAMEQRY=a">Firstname</a>
</td>
<td nowrap="nowrap" >
   &nbsp;
</td>
<td>
   <a href="benefits.asp?SK=177646822STYPE=ELNAMEQRY=a">743</a>
</td>
</tr>

<tr id="r7" >                                    
<td>
   <a href="benefits.asp?SK=177646782STYPE=ELNAMEQRY=a">222-22-2222</a>
</td>
<td >
   <a href="benefits.asp?SK=177646782STYPE=ELNAMEQRY=a">Ignore</a>
</td>
<td nowrap="nowrap" >
   <a href="benefits.asp?SK=177646782STYPE=ELNAMEQRY=a">This</a>
</td>
<td nowrap="nowrap" >
   <a href="benefits.asp?SK=177646782STYPE=ELNAMEQRY=a">7/12/2010</a>
</td>
<td>
   <a href="benefits.asp?SK=177646782STYPE=ELNAMEQRY=a">1070</a>
</td>
</tr>
</root>

Stylesheet

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output method="text" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />

    <xsl:template match="/">
          <xsl:value-of select="//tr[contains(td[4],'&#160;')]/td[1]/a/@href"/>
    </xsl:template>

</xsl:transform>

Output

YES
Mathias Müller
  • 22,203
  • 13
  • 58
  • 75