-1

I am trying to get texts between two symbols or HTML tags, then find and replace all the words 'sun' with 'moon' within that HTML tag or < with &lt;. for example I have a $text like below:

<body>
    <p>
    text text sun text text...
       <tag> some text here sun some text here </tag>
    text text sun text sun text...
       <span>
           <tag> text here sun text text sun text </tag>
           <tag> sun text here sun text sun, sun </tag>
       </span>
    </p>
</body>

I would like to find all the sun's between the <tag>...</tag> tags and replace them with moon so that the result will be:

<body>
    <p>
    text text sun text text...
       <tag> some text here moon some text here </tag>
    text text sun text sun text...
       <span>
           <tag> text here moon text text moon text </tag>
           <tag> moon text here moon text moon, moon </tag>
       </span>
    </p>
</body>

I tried $text = str_replace("sun","moon",$text); but this will replace all the matches in or out of the tags. Also tried preg_replace("/(<tag>)(.*?)sun(.*?)(<\/tag>)/", "$2 moon $3", $text); it doesn't work as expected.

mac
  • 291
  • 3
  • 12
  • 1
    You should look into PHP's [DomDocument](http://php.net/manual/en/class.domdocument.php) for parsing HTML. Don't use regular expressions, since it's not suited for unregular content like HTML. https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – M. Eriksson Sep 01 '18 at 09:48
  • [Try with `preg_replace_callback`](https://www.phprun.org/M8vpHR). But better to use a HTML parser. – bobble bubble Sep 01 '18 at 09:50

3 Answers3

2

If the text inside the tags doesn't contain any <s, one option is to search for sun, followed by lookahead for non-< characters, followed by <\/tag>:

$str = "<body>
    <p>
    text text sun text text...
       <tag> some text here sun some text here </tag>
    text text sun text sun text...
       <span>
           <tag> text here sun text text sun text </tag>
           <tag> sun text here sun text sun, sun </tag>
       </span>
    </p>
</body>";
$result = preg_replace("/sun(?=[^<]*<\/tag>)/", "moon", $str);

Output:

<body>
    <p>
    text text sun text text...
       <tag> some text here moon some text here </tag>
    text text sun text sun text...
       <span>
           <tag> text here moon text text moon text </tag>
           <tag> moon text here moon text moon, moon </tag>
       </span>
    </p>
</body>

That said, using regular expressions to parse HTML is not recommended except in the most trivial cases - consider using a proper HTML parser instead, if at all possible.

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
1

This should probably be done by use of a HTML parser as mentioned in comments. If you want to uses regex, I'd use preg_replace_callback (use of anonymous function >= PHP 5.3).

$text = preg_replace_callback('~<tag>\K.*?(?=</tag>)~s', function ($m) {
  return preg_replace(['~\bsun\b~i','~<~','~>~'], ["moon","&lt;","&rt;"], $m[0]);
}, $text);

See PHP demo at PhpRun.org - without anonymous function:

function rep_tag ($m) {
  return preg_replace(['~\bsun\b~i','~<~','~>~'], ["moon","&lt;","&rt;"], $m[0]);
}

$text = preg_replace_callback('~<tag>\K.*?(?=</tag>)~s', 'rep_tag', $text);

Regex demo and explanation at regex101

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
0

Search: (<tag>)(.*?)sun(.*?)(<\/tag>)

Replace by: \1moon\4