1

I have this HTML code:

<html><body>
    <p>This PHP should be wrapped with an anchor</p>
    <p>This <a href="bla-bla">PHP</a> seems to be already wrapped with an anchor, skip it</p>
    <p>This <b>Android</b> is just another case I want to wrap it with anchor/p>
</body></html>

I would like to find all those PHP words that are part of paragraphs and to wrap them with an anchor-tag, except those PHP words that are already wrapped with a-tags.

I have a (wordpress.org) blog and I want to write a plug-in that basically will search for some predefined words (like PHP, Android, etc) and then to wrap them with a-tag that points to their specific Wikipedia web page.

So, when this task is done the above code will look like this:

<html><body>
    <p>This <a href="wikipedia.com/php-link">PHP</a> has been wrapped with an anchor</p>
    <p>This <a href="bla-bla">PHP</a> was skipped because it was already wrapped with an anchor</p>
    <p>This <b><a href="wikipedia.com/android-link">Android</a></b> was also wrapped. Yhhaaa!</p>
</body></html>

Basically my code looks like this:

$html = $xpath->query("/html/body//p//text()");
if ($html) {
    foreach ($html as $par) {
        // I'm trying to find all nodes except those wrapped by <a> tag
        if ($par->nodeType == XML_TEXT_NODE && $par->nodeValue != $par->parentNode->nodeValue) {
            // find all words within the current node that matches my pattern
            preg_match_all('/[A-Z]+[A-Z\-\']{2,}/', $par->nodeValue,$matches);
            foreach ($matches as $match)
                foreach ($match as $word)
                    // is the word like PHP, Android, etc ?
                    if (in_array(strtolower($word), $MY_WORDS)) {
                        wrap_this_word($word); // if so then wrap it!
                    }
        }

    }
}

Now, I am able to find my nodes and then to find my words, but how to wrap that word within $par node with an a-tag?

It looks like my approach is totally wrong, it must be an other way to do that, it's just that I cannot see it right now.

Donald Duck
  • 8,409
  • 22
  • 75
  • 99
Eugen Mihailescu
  • 3,553
  • 2
  • 32
  • 29

1 Answers1

0

I've found, however, a different approach here:

Regex to match words or phrases in string but NOT match if part of a URL or inside <a> </a> tags. (php)

The idea is to find those words using a regex pattern and wrap them with the help of preg_replace function.

The answer also contains a DOM like approach, see the answer that has got 3 votes.

I think I've got my answer.

If anyone has a better solution please feel free and add it here.

Community
  • 1
  • 1
Eugen Mihailescu
  • 3,553
  • 2
  • 32
  • 29