PHP Regex : Ignore closing tag of HTML if

Question

I can't seem to get this to work and I was hoping for some help.

I'm trying to capture the contents of a specific div (please save the DOM talk, for this specific purpose it doesn't really come into play.)

The problem is, I can't seem to get it to work if there is another div with attributes before it on the same line. I tried specifying only match if there's no > between <div and class="myClass", but I think I'm doing it wrong.

I'm still pretty mystified by regex.

/<div(?!>).*?class="myClass".*?>(.*?)<\/div>/mi

(semi) Working example: http://regex101.com/r/cW0lW6

Technically I'm using it on a PHP string passed via post via ajax for the new version of my CMS. It's for a good cause, I assure you. — Casey Dwayne, Jan 15 '14 at 22:55

score 0 · Accepted Answer · answered Jan 15 '14 at 22:51

0

Try

/<div(?=\s)(?:(?!>).)+?class="myClass".*?>(.*?)<\/div>/si

answered Jan 15 '14 at 22:51

Cool. Care to elaborate on `(?:(?!>).)+?` ? Why the `.` anychar and `+?` – Casey Dwayne Jan 15 '14 at 22:59
Its a fancy way to write a class level `[^>]+` but what a class won't do is `(?:(?!some junk string).)+`. Niether the `?` is necessary, nor is `[^>]` entirely correct, but thats for another day that requires a 15 page regex. All the `.*?`'s are not correct either, thought I would just start out with some basics. – Jan 15 '14 at 23:17
All sorts of whooey in these regexes. Many questions, like closing tag amongst `[^>]*`, its endless, but doable. Most just want a quick and dirty solution, they don't realize the hidden gotcha's. – Jan 15 '14 at 23:23
True. Those endless 'if and but' issue is why I rarely use it, but it does come in very handy for parsing small strings. I'm sure in time it will all make sense. Thanks! – Casey Dwayne Jan 15 '14 at 23:44

score 0 · Answer 2 · edited May 23 '17 at 10:26

0

You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML.

See: RegEx match open tags except XHTML self-contained tags

I suggest using QueryPath for parsing XML and HTML in PHP. It's basically much the same syntax as jQuery, only it's on the server side.

edited May 23 '17 at 10:26

Community

1
1

answered Jan 15 '14 at 22:56

mate64

9,876
17
64
96

For my purpose, this particular solution is probably the *only* way. Using the DOM in this case would be, imo, very hacky. 99.9% of the time I agree this may cause issues. This use would be the .1% , thus my request to spare me on the topic. – Casey Dwayne Jan 15 '14 at 23:02
@kcdwayne I disagree: you **should never use regex to parse HTML**. Simply use [DOMDocument](http://de2.php.net/domdocument). It's really easy, when you understand it - *learning by doing*. – mate64 Jan 15 '14 at 23:06
I *do* understand the DOM, and can traverse it just fine. The point is, all I'm doing is discarding a bogey container that I used to safeguard an important string of PHP. I'm doing it this way to reduce security risks within the new version of my CMS, that way I can eliminate any malicious PHP that might be inserted while protecting my own. You only saw a snippet outlining the problem. I didn't want a lecture, I wanted a solution to a *regex* problem. And the 1st answer of the linked question is clever, I've seen it several times. – Casey Dwayne Jan 15 '14 at 23:16

Casimir et Hippolyte · Answer 3 · 2014-01-15T23:08:34.370

-2

You can use this (simple way):

~<div[^>]+?class="myClass"[^>]*>(.*?)</div>~si

or this (more efficient way if you have a lot of attributes):

~<div(?>[^>c]++|\Bc|c(?!lass=))+class="myClass"[^>]*+>(.*?)</div>~si

Note that these patterns don't work if your div tag contains another div tag.

edited Jan 15 '14 at 23:08

answered Jan 15 '14 at 23:02

Casimir et Hippolyte

88,009
5
94
125

PHP Regex : Ignore closing tag of HTML if

3 Answers3