Match image tag not nested in an anchor tag using regular expression

Question

How would I match images that is not nested inside an anchor tag using regular expression?

Here is what I want:

No match: <a href="index.html"><img src="images/default.jpg" /></a>

Match: <div><img src="images/default.jpg" /></div>

Match: <img src="images/default.jpg" />

I'm no good at regex but this is what I came up so far, which doesn't work:

[^<a[^>]*>]<img.*?/>[^</a>]

I couldn't use lookarounds since PHP wants it to be specific.

Ben Axnick · Accepted Answer · 2012-11-06T06:50:10.397

Much of the reason behind your difficulty is simply that HTML is not a regular language, see: Coding Horror: Parsing Html the Cthulhu Way

Consider using a query expression language powerful enough to process (X)HTML, or just using the DOM programmatically to fetch all image tags and then exclude those with <a> ancestors.

In PHP5, I believe you can use DOMXPath, using that it becomes as simple as:

$generated_string = '<a href="index.html"><img src="images/inside_a.jpg" /></a>' .
                    '<div><img src="images/inside_div.jpg" /></div>' .
                    '<img src="images/inside_nothing.jpg" />';

$doc = new DOMDocument();
$doc->loadHTML($generated_string);
$xpath = new DOMXpath($doc);

$elements = $xpath->query("//*[not(self::a)]/img");

foreach ($elements as $element){
  echo $doc->saveXML($element) . "\n";
}

This code would give the output:

<img src="images/inside_div.jpg"/>
<img src="images/inside_nothing.jpg"/>

Not a problem, use ``$doc->loadHTML($generated_string)`` in that case. — Ben Axnick, Nov 06 '12 at 06:20
Updated my example for operating on a string instead of an HTML file — Ben Axnick, Nov 06 '12 at 06:39

score -1 · Answer 2 · answered Nov 06 '12 at 07:14

-1

<img[^>]*>(?![^<]*</a>)

answered Nov 06 '12 at 07:14

user1105430

1,379
3
15
27

Match image tag not nested in an anchor tag using regular expression

2 Answers2

Linked