PHP preg_replace_callback match string but exclude urls

Question

What I'm trying to do is find all the matches within a content block, but ignore anything that is inside tags, for use inside preg_replace_callback().

For example:

test
<a href="test.com">test title</a>
test

In this case, I want the first line to match, and the third line to match, but NOT the url match, nor the title match in between the a tags.

I've got a regex that I feel like is close:

#(?!<.*?)(\btest\b)(?![^<>]*?>)#si

(and this will not match the url part)

But how do I modify the regex to also exclude the "test" between a and /a?

`and the fourth line to match` Erm, you only have three lines in your input? — CertainPerformance, Oct 20 '18 at 21:52
Do you have to account for nested tags as well? Eg `testtesttest`, or self-closing tags? Sounds like a job for something that's *not* a regular expression (HTML and regex generally do not work well together) — CertainPerformance, Oct 20 '18 at 21:55
HTML and regex are not good friends. Use a parser, it is simpler, faster and much more maintainable. See: http://php.net/manual/en/class.domdocument.php — Toto, Oct 21 '18 at 08:41
It doesn't use nested tags, and unfortunately due to the application I have to use regex, but I appreciate the thoughtful question and suggestion. — Ben, Oct 21 '18 at 12:32

score 0 · Answer 1 · answered Oct 20 '18 at 22:01

0

If it's always the same pattern you can use [A-Z] or a combination like [A-Za-z]

answered Oct 20 '18 at 22:01

Jake

21
7

How is this answering the question? – Toto Oct 21 '18 at 08:41

score 0 · Answer 2 · answered Oct 21 '18 at 12:29

0

I ended up solving it myself. This regex pattern will do what I wanted:

#(?!<a[^>]*?>)(\btest\b)(?![^<]*?<\/a>)#si

answered Oct 21 '18 at 12:29

Ben

1
2

PHP preg_replace_callback match string but exclude urls

2 Answers2