Non-Greedy Regex
tag php

Question

I am trying to get the specific qualifier for each instance of part#1AMTB00186 from the html below. I need it to return 4cyl 2.3L - F23A1, Balance Shaft and 4cyl 2.3L - F23A1, CAM. I believe my regex is greedy, but I cannot figure out how to make it non-greedy. It always displays the first qualifier of 2.3L L4, Engine-F23A1. I am using:

partno="1AMTB00186";

$pattern_short ='{<td\s+class="qualifier"\s*>.*<div>([^<]+)</div>.*' . $partno . '}sU';
$matchcount = preg_match_all($pattern_short, $data, $matches);

<tr>
<tr id="61" class="findme">
<td class="productName">
<h3>Air and Fuel Delivery - Fuel Pumps and Related Components</h3>
<br>Electric Fuel</td>
<td class="qualifier"><div>2.3L L4, Engine-F23A1</div></td>
<td class="partNum">1AMFP00020</td>
</tr>
<tr id="62" class="odd findme">
<td class="productName">
<h3>Air and Fuel Delivery - Fuel Pumps and Related Components</h3>
<br>Electric Fuel</td>
<td class="qualifier"><div>3.0L V6, Engine-J30A1</div></td>
</tr>
<tr id="63" class="findme">
<td class="productName">
<h3>Belts - Timingbelts</h3>
<br>Timingbelt</td>
<td class="qualifier"><div>4cyl 2.3L - F23A1, Balance Shaft</div></td>
<td class="partNum">1AMTB00186</td>
</tr>
<tr id="64" class="odd findme">
<td class="productName">
<h3>Belts - Timingbelts</h3>
<br>Timingbelt</td>
<td class="qualifier"><div>4cyl 2.3L - F23A1, CAM</div></td>
<td class="partNum">1AMTB00244</td>
</tr>
</tr>
<tr id="63" class="findme">
<td class="productName">
<h3>Belts - Timingbelts</h3>
<br>Timingbelt</td>
<td class="qualifier"><div>4cyl 2.3L - F23A1, CAM</div></td>
<td class="partNum">1AMTB00186</td>
</tr>
<tr id="65" class="findme">
<td class="productName">
<h3>Belts - Timingbelts</h3>
<br>Timingbelt</td>
<td class="qualifier"><div>V6 3.0L - J30A1, CAM</div></td>
<td class="partNum">1AMTB00286</td>
</tr>
<tr id="66" class="odd findme">
<td class="productName">
<h3>Brakes - Disc Brake Pad and Hardware Kit</h3>
<br>Front; 7345-D465 Ceramic</td>
<td class="qualifier"><div>L4 2.3L</div></td>
<td class="partNum">1AMV300465</td>
</tr>

Thank You

score 2 · Answer 1 · edited May 23 '17 at 12:28

In all seriousness, please stop trying to parse large blocks of HTML code using regex. It's the wrong tool for the job.

Instead, PHP has got a perfectly good DOM parser built in. There's a really good explaination of how to use it here: how to use dom php parser (and plenty of other tutorials around if you look).

In short, you need something like this:

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$query = '//tr/td[@class="partNum" and text() = "1AMTB00186"]/preceding-sibling::td[@class="qualifier"]';
foreach ($xpath->query($query) as $qualifier) {
    echo $qualifier->nodeValue, PHP_EOL;
}

The XPath $query explained:

Match all TD elements with a class "qualifier" that are preceding any TD elements with the class "partNum" and the content "1AMTB00186" which are direct children of a TR elements

An alternate variant to write that XPath would be

//tr/td[
    @class="qualifier" and following-sibling::td[
        @class="partNum" and text() = "1AMTB00186"
    ]
]

That works. However I made a mistake in my original post. There is another line of code in there before the part number that makes it not work.
4cyl 2.2L - F22B1, Balance Shaft
1AMTB00186 — Chris Chessey, May 03 '13 at 14:26
@ChrisChessey change `text()` to `descendant-or-self::*/text()`. Also see http://schlitt.info/opensource/blog/0704_xpath.html — Gordon, May 03 '13 at 14:36
That only returns one result. Sorry I'm not super familiar with this stuff. — Chris Chessey, May 03 '13 at 15:10

Non-Greedy Regex tag php

1 Answers1

Non-Greedy Regex
tag php