-1

I wrote a regular expression trying to match some html code but I can't quite it to work. I'm having a problem with the part after "wp-caption".

class=(["\'])(?:[\w\s])*?wp-caption[\s\1]

The code I want to match:

class="wp-caption"
class='wp-caption'
class="wp-caption foo"
class="foo wp-caption"

I match the first three results but not the fourth. I don't think the \1 is working. Any thoughts?

BTdubs I've been using http://regexpal.com/ for testing purposes.

BFTrick
  • 5,211
  • 6
  • 24
  • 28

3 Answers3

0
class=(["\'])(?:[\w\s])*wp-caption[\w\s]*\1
endy
  • 3,872
  • 5
  • 29
  • 43
0

This might work too

class\s*=\s*(['"])(?:(?!\1).)*wp-caption(?:(?!\1).)*\1
0

It's not working because backreferences can't be referenced from within a character class (the stuff inside the square brackets []). As mentioned in another answer, you could use a backrefence in a lookahead, unless you're using a language that doesn't support lookaround...

In short, what you need to do depends upon the language you are using (regex implementations depend heavily on what language is implementing them)

Code Jockey
  • 6,611
  • 6
  • 33
  • 45
  • I haven't even heard about lookaheads. So why does the backreference work with the first three examples? – BFTrick Mar 31 '12 at 16:52
  • @BFTrick honestly, I'm not sure why it would work with the first two - I'm fairly proficient in several flavors of regex (including JavaScript, which I am assuming you are using) and I don't see how your expression could match the first two lines, but I cut and pasted your example expression and code into regexpal.com just to be sure, and it only matches the third line in your example... – Code Jockey Apr 04 '12 at 20:36
  • @BFTrick `[\s\1]` means essentially "match either a whitespace character (space, tab, CrLf, and others), a backslash, _or_ a literal digit `1`" - if you were trying to identify any class attribute that _contained_ `wp-caption`, then I assume you removed some other classes that were _following_ `wp-caption` and matches were only successful because the they hit on the whitespace between those classes - they _should not be able_ to match the first two lines in your example code using your expression. [-->More info on 'lookaround'](http://www.regular-expressions.info/lookaround.html) – Code Jockey Apr 04 '12 at 20:41