I have this example HTML:
<div class="_bns--table">
<table class="bns--table" border="0" cellspacing="0" cellpadding="0" width="737">
<tbody><tr><td width="151" colspan="2" rowspan="2"><p><b>Field Name</b></p>
</td>
<td width="161"><p><b>GG Text (TXT)</b></p>
</td>
<td width="142"><p><b>Excellent Text (TXT)</b></p>
</td>
<td width="142"><p><b>Text (Text)</b></p>
</td>
<td width="142"><p><b>Super Text (TXT)</b></p>
</td>
</tr><tr><td width="161"><p>Super Instruction</p>
</td>
<td width="142"><p>Super Instruction</p>
</td>
<td width="142"><p>Super Instruction</p>
</td>
<td width="142"><p>Super Instruction</p>
</td>
</tr><tr><td width="76"><p>SUBMIT TO: </p>
</td>
<td width="76"><p><b>Intermediary Text</b></p>
</td>
<td width="161"><p><b>Q.W. Super Good Text</b></p>
<p><b>Address:</b> Long Dong Plaza New York, United States</p>
<p><b>Sample:</b> 001068967</p>
<p><b>TEXT CODE:</b> TEXTT33</p>
<p><b>SUPER EXAMPLE:</b> 031111521</p>
</td>
<td width="142"><p><b>The Company of Super Compania</b> International Company Division<br>
<b>Address:</b> 44 Wong Street West<br>
Toronto, Ontario, Canada<br>
<b>TEXT CODE: </b>BOBFFCDD</p>
</td>
<td width="142"><p><b>DGG Company, Belgium Company</b></p>
<p><b>Address:</b> Brussels, Belgium</p>
<p><b>Sample Number:</b> 201-0207080-43</p>
<p><b>TEXT CODE:</b> DDRUDEDD040</p>
</td>
<td width="142"><p><b>TDTT Company PLC</b><br>
<b>Address:</b> 8 Red Square Chicken Head, London, England, E15 8HQ <br>
<b>TEXT CODE:</b> BIBHGB77<br>
<b>Sample:</b> 47605627</p>
</td>
</tr><tr><td width="76"><p>LETTER TO: </p>
</td>
<td width="76"><p><b>Excellent Company</b></p>
</td>
<td width="586" colspan="4"><p><b>Superexamplecompany (Lols & Keks Ltd)</b></p>
<p><b>Address:</b> Somethingsuperimportant, Brothers and Sisters</p>
<p><b>TEXT Code:</b> BONTFQWE</p>
</td>
</tr><tr><td width="76"><p>:</p>
</td>
<td width="76"><p><b>Postal/ Courier's Information</b></p>
</td>
<td width="586" colspan="4"><p><b>Your Full Name or Your Company Full Name</b><br>
<b>Address:</b> including Street Number, Street Name, City, Province/State, Country, and Postal Code <br>
Your Sample <b>Code</b> and <b>Sample Number</b></p>
</td>
</tr></tbody></table>
</div>
What I need to do is to match the element based on the following criteria: contains the word "example" and the word "sample", both being case-insensitive and whole words only, as well as a number at least 3 digits long. In the HTML code above, only the following element matches that criteria:
<td width="161"><p><b>Q.W. Super Good Text</b></p>
<p><b>Address:</b> Long Dong Plaza New York, United States</p>
<p><b>Sample:</b> 001068967</p>
<p><b>TEXT CODE:</b> TEXTT33</p>
<p><b>SUPER EXAMPLE:</b> 031111521</p>
</td>
I have this huge XPath 1.0 expression:
//*[
(
contains(
concat(
' ',
translate(
translate(., 'example', 'EXAMPLE'),
':,;.',
' '
),
' '
),
' EXAMPLE '
) and
contains(
concat(
' ',
translate(
translate(., 'sample', 'SAMPLE'),
':,;.',
' '
),
' '
),
' SAMPLE '
)
) and
translate(., translate(., '0123456789', ''), '') >= 3
]
[not(
*[
(
contains(
concat(
' ',
translate(
translate(., 'example', 'EXAMPLE'),
':,;.',
' '
),
' '
),
' EXAMPLE '
) and
contains(
concat(
' ',
translate(
translate(., 'sample', 'SAMPLE'),
':,;.',
' '
),
' '
),
' SAMPLE '
)
) and
translate(., translate(., '0123456789', ''), '') >= 3
]
)]
While it's supposed to select only the element that doesn't have any children matching the same criteria (quoted above), for some reason it selects the whole parent element <tr>
. I need a query that would only match that single td element of this table, but without restricting the query to a specific type of elements.
It is a requirement to use XPath 1.0, because the software I'm using (Octoparse) doesn't support newer XPath versions.