0

I can't seem to get this to work and I was hoping for some help.

I'm trying to capture the contents of a specific div (please save the DOM talk, for this specific purpose it doesn't really come into play.)

The problem is, I can't seem to get it to work if there is another div with attributes before it on the same line. I tried specifying only match if there's no > between <div and class="myClass", but I think I'm doing it wrong.

I'm still pretty mystified by regex.

/<div(?!>).*?class="myClass".*?>(.*?)<\/div>/mi

(semi) Working example: http://regex101.com/r/cW0lW6

Casey Dwayne
  • 2,142
  • 1
  • 17
  • 32

3 Answers3

0

Try

/<div(?=\s)(?:(?!>).)+?class="myClass".*?>(.*?)<\/div>/si

  • Cool. Care to elaborate on `(?:(?!>).)+?` ? Why the `.` anychar and `+?` – Casey Dwayne Jan 15 '14 at 22:59
  • Its a fancy way to write a class level `[^>]+` but what a class won't do is `(?:(?!some junk string).)+`. Niether the `?` is necessary, nor is `[^>]` entirely correct, but thats for another day that requires a 15 page regex. All the `.*?`'s are not correct either, thought I would just start out with some basics. –  Jan 15 '14 at 23:17
  • All sorts of whooey in these regexes. Many questions, like closing tag amongst `[^>]*`, its endless, but doable. Most just want a quick and dirty solution, they don't realize the hidden gotcha's. –  Jan 15 '14 at 23:23
  • True. Those endless 'if and but' issue is why I rarely use it, but it does come in very handy for parsing small strings. I'm sure in time it will all make sense. Thanks! – Casey Dwayne Jan 15 '14 at 23:44
0

You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML.

See: RegEx match open tags except XHTML self-contained tags

I suggest using QueryPath for parsing XML and HTML in PHP. It's basically much the same syntax as jQuery, only it's on the server side.

Community
  • 1
  • 1
mate64
  • 9,876
  • 17
  • 64
  • 96
  • For my purpose, this particular solution is probably the *only* way. Using the DOM in this case would be, imo, very hacky. 99.9% of the time I agree this may cause issues. This use would be the .1% , thus my request to spare me on the topic. – Casey Dwayne Jan 15 '14 at 23:02
  • @kcdwayne I disagree: you **should never use regex to parse HTML**. Simply use [DOMDocument](http://de2.php.net/domdocument). It's really easy, when you understand it - *learning by doing*. – mate64 Jan 15 '14 at 23:06
  • I *do* understand the DOM, and can traverse it just fine. The point is, all I'm doing is discarding a bogey container that I used to safeguard an important string of PHP. I'm doing it this way to reduce security risks within the new version of my CMS, that way I can eliminate any malicious PHP that might be inserted while protecting my own. You only saw a snippet outlining the problem. I didn't want a lecture, I wanted a solution to a *regex* problem. And the 1st answer of the linked question is clever, I've seen it several times. – Casey Dwayne Jan 15 '14 at 23:16
-2

You can use this (simple way):

~<div[^>]+?class="myClass"[^>]*>(.*?)</div>~si

or this (more efficient way if you have a lot of attributes):

~<div(?>[^>c]++|\Bc|c(?!lass=))+class="myClass"[^>]*+>(.*?)</div>~si

Note that these patterns don't work if your div tag contains another div tag.

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125