1

I've been trying to do a regex to match and replace the occurrences of a keyword on a portion of HTML:

  1. i want to match keyword and <strong>keyword</strong>
  2. but <a href="someurl.html" target="_blank">keyword</a> and <a href="someur2.html">already linked keyword </a> should NOT be matched

I'm only interested in matching (and replacing) the keyword on the 1st line.

The reason I want this is to replace keyword with <a href="dictionary.php?k=keyword">keyword</s>, but ONLY if keyword it's not already inside an <a> tag.

Any help will be much appreciated!

eykanal
  • 26,437
  • 19
  • 82
  • 113
tixastronauta
  • 404
  • 4
  • 17
  • 1
    I cleaned this up a bit, as the formatting was quite off, but I'm not sure my corrections are completely correct... tixastronauta, if my "fix" introduced mistakes, please edit and correct them. – eykanal Oct 18 '11 at 00:10

4 Answers4

3
$str = preg_replace('~Moses(?!(?>[^<]*(?:<(?!/?a\b)[^<]*)*)</a>)~i',
                    '<a href="novo-mega-link.php">$0</a>', $str);

The expression inside the negative lookahead matches up to the next closing </a> tag, but only if it doesn't see an opening <a> tag first. If that succeeds it means the word Moses is inside an anchor element, so the lookahead fails, and no match occurs.

Here's a demo.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
1

I managed to do what I wanted (without using Regex) by:

  • parsing each character of my string
  • removing all <a> tags (copying them to a temporary array and keeping a placeholder on the string)
  • str_replace the new string in order to replace all the keywords
  • repopulating the placeholders by it's original <a> tags

Here's the code I used, in case someone else needs it:

$str = <<<STRA
Moses supposes his toeses are roses,
but <a href="original-moses1.html">Moses</a> supposes erroneously;
for nobody's toeses are posies of roses,
as Moses supposes his toeses to be.
Ganda <span class="cenas"><a href="original-moses2.html" target="_blank">Moses</a></span>!
STRA;

$arr1 = str_split($str);

$arr_links = array();
$phrase_holder = '';
$current_a = 0;
$goto_arr_links = false;
$close_a = false;

foreach($arr1 as $k => $v)
{
    if ($close_a == true)
    {
        if ($v == '>') {
            $close_a = false;
        } 
        continue;
    }

    if ($goto_arr_links == true)
    {
        $arr_links[$current_a] .= $v;
    }

    if ($v == '<' && $arr1[$k+1] == 'a') { /* <a */
        // keep collecting every char until </a>
        $arr_links[$current_a] .= $v;
        $goto_arr_links = true;
    } elseif ($v == '<' && $arr1[$k+1] == '/' && $arr1[$k+2] == 'a' && $arr1[$k+3] == '>' ) { /* </a> */
        $arr_links[$current_a] .= "/a>";

        $goto_arr_links = false;
        $close_a = true;
        $phrase_holder .= "{%$current_a%}"; /* put a parameter holder on the phrase */
        $current_a++;
    }    
    elseif ($goto_arr_links == false) {
        $phrase_holder .= $v;
    }
}

echo "Links Array:\n";
print_r($arr_links);
echo "\n\n\nPhrase Holder:\n";
echo $phrase_holder;
echo "\n\n\n(pre) Final Phrase (with my keyword replaced):\n";
$final_phrase = str_replace("Moses", "<a href=\"novo-mega-link.php\">Moses</a>", $phrase_holder);
echo $final_phrase;
echo "\n\n\nFinal Phrase:\n";
foreach($arr_links as $k => $v)
{
    $final_phrase = str_replace("{%$k%}", $v, $final_phrase);
}
echo $final_phrase;

The output:

Links Array:

Array
(
    [0] => <a href="original-moses1.html">Moses</a>
    [1] => <a href="original-moses2.html" target="_blank">Moses</a>
)

Phrase Holder:

Moses supposes his toeses are roses,
but {%0%} supposes erroneously;
for nobody's toeses are posies of roses,
as Moses supposes his toeses to be.
Ganda <span class="cenas">{%1%}</span>!

(pre) Final Phrase (with my keyword replaced):

<a href="novo-mega-link.php">Moses</a> supposes his toeses are roses,
but {%0%} supposes erroneously;
for nobody's toeses are posies of roses,
as <a href="novo-mega-link.php">Moses</a> supposes his toeses to be.
Ganda <span class="cenas">{%1%}</span>!

Final Phrase:

<a href="novo-mega-link.php">Moses</a> supposes his toeses are roses,
but <a href="original-moses1.html">Moses</a> supposes erroneously;
for nobody's toeses are posies of roses,
as <a href="novo-mega-link.php">Moses</a> supposes his toeses to be.
Ganda <span class="cenas"><a href="original-moses2.html" target="_blank">Moses</a></span>!
tixastronauta
  • 404
  • 4
  • 17
0
$lines = explode( "\n", $content );
$lines[0] = stri_replace( "keyword", "replacement", $lines[0] );
$content = implode( "\n", $lines );

or if you explicitly want to use a regular expression

$lines = explode( "\n", $content );
$lines[0] = preg_replace( "/keyword/i", "replacement", $lines[0] );
$content = implode( "\n", $lines );
Ben Swinburne
  • 25,669
  • 10
  • 69
  • 108
-1

Consider using an HTML parsing library rather than a regular expression, like simplehtmldom. You can use it to update the contents of specific HTML tags (therefore, ignoring ones you don't want to change). You wouldn't have to use a regex then; just use a function like str_replace once you've filtered the appropriate tags.

imm
  • 5,837
  • 1
  • 26
  • 32