2

For example it works:

{<div\s+class=\"article\"><h2(.*)</div>}s

If I do this way, I get nothing:

{<div\s+class=\"article\">
    <h2(.*)
 </div>}s

I suspect that I should use some modifier, but I know which one from here: http://php.net/manual/en/reference.pcre.pattern.modifiers.php

Anthony
  • 3,218
  • 3
  • 43
  • 73
  • 2
    And if you insist on using regexes to parse HTML, at least use `.*?` instead of `.*` (unless there is only one `
    ` element on the entire page). Also, `{}` is a poor choice for a regex delimiter. Better use `~` or `#` if you don't want to use `/`.
    – Tim Pietzcker Sep 22 '11 at 09:50
  • add /siU and problem must be solved. – ufucuk Sep 22 '11 at 10:01

1 Answers1

4

That would be the /x modifier:

x (PCRE_EXTENDED)

If this modifier is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class, and characters between an unescaped # outside a character class and the next newline character, inclusive, are also ignored. This is equivalent to Perl's /x modifier, and makes it possible to include comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence (?( which introduces a conditional subpattern.

It also allows commenting the pattern, which is extremely useful:

{<div\s+class=\"article\">  # many spaces between the div and the attribute
    <h2(.*)                 # don't really care about closing the tag
 </div>}sx
Kobi
  • 135,331
  • 41
  • 252
  • 292