Regex : ignore if there is an html tag

Question

I have a regex pattern:

\(\s*\'\s*(.*?)\s*\'\)

This pattern means, get any text between ('TEXT').

There is a problem: The text may have a HTML tags.

So I want a pattern. If it didn't find a HTML tags, get the text normally, but if it found a HTML tags, the pattern get the text between the tags.

Example:

If the text is

('foo foo text here')

the pattern gets:

foo foo text here

And if the text is:

('<div class='test'> foo foo text here </div>')

the pattern gets

foo foo text here

So the pattern ignore the HTML tags (if there is any), and grab the text .

I need a button that just does this for me. DON'T REGEX HTML, PARSE IT! — zellio, Aug 03 '11 at 01:49
As usual: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — Ted Kulp, Aug 03 '11 at 01:49
This isn't an HTML parsing question. It's an HTML stripping question disguised as an HTML parsing question. — Michael Berkowski, Aug 03 '11 at 01:51
Tricky, tricky. You're right. I still like excuses to use that link, though. :) — Ted Kulp, Aug 03 '11 at 01:53

score 4 · Accepted Answer · answered Aug 03 '11 at 01:49

You can call strip_tags() inside your preg_match(). That will turn:

('<div class='test'> foo foo text here </div>')

Into:

( 'foo foo text here' )

Then your regex as you designed it will remove the parens.

preg_match("/\(\s*\'\s*(.*?)\s*\'\)/", strip_tags($yourstring), $matches);

Nathan Fox · Answer 2 · 2011-08-03T02:05:47.060

0

I believe this works as well:

>\s*(.*?)\s*</|\(\s*\'(?!<)\s*(.*?)\s*\'\)

Although it does capture to two different capture groups.

At least it might be another option :-)

edited Aug 03 '11 at 02:05

answered Aug 03 '11 at 02:00

Nathan Fox

2 Answers2