0

I'd like to create a regex pattern that captures everything within a selfclosing html tag in a string, it is to be used in a php preg_replace that removes all selfclosing tags (that are normally not selfclosing, i.e. div, span etc.) from a html dom string.

Here's an example. In the string:

'<div id="someId><div class="someClass" /></div>'

I would like to get the match:

'<div class="someClass" />'

But I keep getting no match at all or this match:

'<div id="someId><div class="someClass" />'

I have tried the following regex patterns and various combinations of them

A simple regex pattern with the dot wildcard and excluding ">":

~<div.*?[^>].*?.*?/>~

A negative lookahead regex:

~<div(?!.*?>.*?)/>~

A negative lookbehind regex:

~<div.*?(?<!>).*?/>~

What am I missing?

Marcin Orlowski
  • 72,056
  • 11
  • 123
  • 141
Rene Jorgensen
  • 169
  • 1
  • 8

3 Answers3

1

Use a parser approach instead:

<?php

$html = <<<DATA
<div id="someId">
    <div class="someClass" />
</div>
DATA;

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DomXPath($dom);

$divs = $xpath->query("//div[@class='someClass']");
foreach ($divs as $div) {
    // do sth. useful here
}

?>

This sets up the DOM and looks for the div in question (via an xpath expression).

Jan
  • 42,290
  • 8
  • 54
  • 79
  • The above was merely an example, which I should have been more clear about. But I wasn't aware about those DOM functionalities in PHP you mention above, so thanks for that :) – Rene Jorgensen Oct 04 '17 at 12:58
0

Seems I unnecessarily complicated the answer:

For my example this will yield the correct result:

~<div[^>]+?/>~

'div' can be replaced by a capture group to include additional tags if needed

Rene Jorgensen
  • 169
  • 1
  • 8
0

Use following regex:

<div[^<]*\/>

This regex just checks that there is no < inside the self-closing tag. This will be a problem if < is used inside the tag (eg. in a string).

To excluce < inside a string:

<div(?:[^<]*["'][^"']*["'][^<]*)\/>
Huntro
  • 322
  • 1
  • 3
  • 16