0

Is there anything in JavaSE, Spring or Apache Commons StringUtils that would allow me to strip out HTML from a String but also supply a whitelist of HTML entities that I would like to allow?

Thanks

csilk
  • 188
  • 1
  • 15
  • Using apache commmons 2.4 and Spring 3.1 – csilk Jul 30 '12 at 09:35
  • After reading through a lot of documentation I don't think what I'm looking for exists. I will have to use a regex to strip out what I don't want (blacklist) or come up with another way to get rid of everything except what I do want. I'm not happy with the blacklist approach as there will surely be something left out though. I'm surprised there is nothing within the libs that we're using that isn't flexible enough to do this for us. – csilk Jul 30 '12 at 10:00

2 Answers2

0

You can take a look at OWASP AntiSamy project. It has default policy files, but you can tailor them to your needs; see the Developer Guide for details on that.

mthmulders
  • 9,483
  • 4
  • 37
  • 54
  • I'm unlikely to be able to add new libraries to the project so I'm looking to work with what's available already. – csilk Jul 30 '12 at 09:34
  • You could use [StringEscapeUtils.html#escapeHtml(java.lang.String)](http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/StringEscapeUtils.html#escapeHtml%28java.lang.String%29) but it doesn't allow for whitelisting AFAIK... – mthmulders Jul 30 '12 at 09:38
  • I don't think what I want exists within the framework and libraries we're using. This was resolved using a regular expression. – csilk Aug 05 '12 at 19:58
0

Using a regular expression was the answer here. There was no solution to this issue in any of the libraries available to me.

csilk
  • 188
  • 1
  • 15