0

Normally i use this code for remove all data inner img tag. It's work good.

<?PHP
$string = "<b>test</b><img src=\"https://www.google.co.th/images/nav_logo242.png\"><script>alert();</script>";
$string = preg_replace("/<img[^>]+>/", "", $string);
echo $string;
?>

Then i apply code for remove all data inner script tag. this is my code.

<?PHP
$string = "<b>test</b><img src=\"https://www.google.co.th/images/nav_logo242.png\"><script>alert();</script>";
$string = preg_replace("/<scrip[^>]+script>/", "", $string);
echo $string;
?>

When test code it's not remove data inner script tag. Why ?

1 Answers1

1

Your code doesn't work because you're parsing <scrip, followed by zero or more characters other than >, followed by script>.

There is no such substring in your content. In your $string, after <scrip you have a t (which matches [^>]+) and then you have a > instead of script>. So, no match.

Here's what you need to do instead:

$string = preg_replace("/<script.*?<\/script>/si", "", $string);

You cannot use [^<] or [^>] because javascript code may contain many < and > characters itself.

Here's what the above regex does:

• Search for <script
I intentionally did not include the closing > bracket here, because maybe you have some attributed in the script tag, like <script type='text/javascript'>

• Followed by any sequence of random characters, using lazy evaluation
Note the .*? instead of .*, this captures as little characters as possible to find a match, instead of as much as possible. This avoids the following problem:
<script>something</script> other content <script>more script</script>
Without lazy evaluation, it would remove everything from the first <script> to the last </script>

• Followed by </script> to mark the end of the script section
Note I'm escaping the slash (\/ instead of /) because / is the regex delimiter character here. We could also have used a different character at the beginnen and end of the regex, like #, and then the / didn't have to be escaped.

• Finally, I added the s and i modifiers. s to make it parse multiline content. Javascript code can of course contain linebreaks, and we want .*? to match those as well. And i to make it case insensitive, because I assume you want to replace <Script> or <SCRIPT> too.

Community
  • 1
  • 1
RocketNuts
  • 9,958
  • 11
  • 47
  • 88