-1

I am trying to get the specific qualifier for each instance of part#1AMTB00186 from the html below. I need it to return 4cyl 2.3L - F23A1, Balance Shaft and 4cyl 2.3L - F23A1, CAM. I believe my regex is greedy, but I cannot figure out how to make it non-greedy. It always displays the first qualifier of 2.3L L4, Engine-F23A1. I am using:

partno="1AMTB00186";

$pattern_short ='{<td\s+class="qualifier"\s*>.*<div>([^<]+)</div>.*' . $partno . '}sU';
$matchcount = preg_match_all($pattern_short, $data, $matches);
<tr>
<tr id="61" class="findme">
<td class="productName">
<h3>Air and Fuel Delivery - Fuel Pumps and Related Components</h3>
<br>Electric Fuel</td>
<td class="qualifier"><div>2.3L L4, Engine-F23A1</div></td>
<td class="partNum">1AMFP00020</td>
</tr>
<tr id="62" class="odd findme">
<td class="productName">
<h3>Air and Fuel Delivery - Fuel Pumps and Related Components</h3>
<br>Electric Fuel</td>
<td class="qualifier"><div>3.0L V6, Engine-J30A1</div></td>
</tr>
<tr id="63" class="findme">
<td class="productName">
<h3>Belts - Timingbelts</h3>
<br>Timingbelt</td>
<td class="qualifier"><div>4cyl 2.3L - F23A1, Balance Shaft</div></td>
<td class="partNum">1AMTB00186</td>
</tr>
<tr id="64" class="odd findme">
<td class="productName">
<h3>Belts - Timingbelts</h3>
<br>Timingbelt</td>
<td class="qualifier"><div>4cyl 2.3L - F23A1, CAM</div></td>
<td class="partNum">1AMTB00244</td>
</tr>
</tr>
<tr id="63" class="findme">
<td class="productName">
<h3>Belts - Timingbelts</h3>
<br>Timingbelt</td>
<td class="qualifier"><div>4cyl 2.3L - F23A1, CAM</div></td>
<td class="partNum">1AMTB00186</td>
</tr>
<tr id="65" class="findme">
<td class="productName">
<h3>Belts - Timingbelts</h3>
<br>Timingbelt</td>
<td class="qualifier"><div>V6 3.0L - J30A1, CAM</div></td>
<td class="partNum">1AMTB00286</td>
</tr>
<tr id="66" class="odd findme">
<td class="productName">
<h3>Brakes - Disc Brake Pad and Hardware Kit</h3>
<br>Front; 7345-D465 Ceramic</td>
<td class="qualifier"><div>L4 2.3L</div></td>
<td class="partNum">1AMV300465</td>
</tr>

Thank You

Bergi
  • 630,263
  • 148
  • 957
  • 1,375

1 Answers1

2

In all seriousness, please stop trying to parse large blocks of HTML code using regex. It's the wrong tool for the job.

Instead, PHP has got a perfectly good DOM parser built in. There's a really good explaination of how to use it here: how to use dom php parser (and plenty of other tutorials around if you look).

In short, you need something like this:

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$query = '//tr/td[@class="partNum" and text() = "1AMTB00186"]/preceding-sibling::td[@class="qualifier"]';
foreach ($xpath->query($query) as $qualifier) {
    echo $qualifier->nodeValue, PHP_EOL;
}

The XPath $query explained:

Match all TD elements with a class "qualifier" that are preceding any TD elements with the class "partNum" and the content "1AMTB00186" which are direct children of a TR elements

An alternate variant to write that XPath would be

//tr/td[
    @class="qualifier" and following-sibling::td[
        @class="partNum" and text() = "1AMTB00186"
    ]
]
Community
  • 1
  • 1
Spudley
  • 166,037
  • 39
  • 233
  • 307
  • That works. However I made a mistake in my original post. There is another line of code in there before the part number that makes it not work.
    4cyl 2.2L - F22B1, Balance Shaft
    1AMTB00186
    – Chris Chessey May 03 '13 at 14:26
  • @ChrisChessey change `text()` to `descendant-or-self::*/text()`. Also see http://schlitt.info/opensource/blog/0704_xpath.html – Gordon May 03 '13 at 14:36
  • That only returns one result. Sorry I'm not super familiar with this stuff. – Chris Chessey May 03 '13 at 15:10