2

I used the following code to remove script, link tags from my string,

$contents='<script>inside tag</script>hfgkdhgjh<script>inside 2</script>';
$ss=preg_replace('#<script(.*?)>(.*?)</script>#is', '', $contents);
echo htmlspecialchars($ss);

it works fine. But can I use anything that similar to html parsing rather than preg_match for this?

joHN
  • 1,765
  • 3
  • 15
  • 31
  • Have you made sure you're getting something back from your call? – jprofitt Mar 31 '12 at 04:46
  • strip_tags might help http://www.php.net/manual/en/function.strip-tags.php ... look at user contributed scripts at the bottom of the page – gpasci Mar 31 '12 at 04:57
  • @jpofit: yes, it returns the site contents, after stripping the contents and tags. – joHN Mar 31 '12 at 05:58

2 Answers2

2

Here are few things you can do

  1. htmlspecialchars() can prove those tags useless
  2. striptags() removes all HTML tags

But the technique you are using is the correct one. However here is a improved version for that

echo preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', "", $contents);
Starx
  • 77,474
  • 47
  • 185
  • 261
0

HTML Purifier is always a good choice. phpQuery has also come in handy a few times.

If you are sanitizing content, it's very easy to make mistakes with regular expressions... read this post. It just depends what you're trying to achieve.

Community
  • 1
  • 1
jmlnik
  • 2,867
  • 2
  • 17
  • 20