1

I'm using the reimplementation of the famous wmd-javascript editor PageDown on client side (which is also used on Stackoverflow).

Now, I'm searching an HTML sanitizer for my server (runs tomcat7) which should only filter the HTML-subset that the PageDown editor can create.

My first choice was the OWASP project but I didn't found a xml rule file for PageDown - the rule-file for tinymce was too restrictive because it didn't include e.g. an "img"-tag.

Building my own set of rules is not only quite painful, it gives me security concerns. For this reason I wanted to ask if there are Java-classes or OWASP-Rules or something else out there which also have been tested.

Help would be very appreciated!

Thx in advance, Thomas

Thomas Pototschnig
  • 241
  • 1
  • 3
  • 8

3 Answers3

2

You can use JSoup.
Its allows you to whitelist the elements you want in the resulting HTML.

See jsoup_cookbook_cleaning-html_whitelist-sanitizer

Chandra Sekhar
  • 16,256
  • 10
  • 67
  • 90
0

Use HTML Purifier, html5lib, or another system built specifically for HTML sanitization. (Since you asked about OWASP: The good ones will use the OWASP whitelist of allowed tags, attributes, and other syntax.)

D.W.
  • 3,382
  • 7
  • 44
  • 110
0

OWASP's new HTML Sanitizer doesn't require you to maintain rules in an XML configuration language.

It comes with pre-packaged policies which can be unioned together, and if you want to do a custom policy, you can do that in Java code.

Mike Samuel
  • 118,113
  • 30
  • 216
  • 245