3

I use Owasp Anti samy with Ebay policy file to prevent XSS attacks on my website.

I also use Hibernate search to index my objects.

When I use this code:

String html = "special word: été";    

// use the Ebay configuration file    
Policy policy = Policy.getInstance(xssPolicyFile.getInputStream());

AntiSamy as = new AntiSamy();
CleanResults cr = as.scan(html, policy);

// result is now : "special word: été"
result = cr.getCleanHTML();

As you can see all chars "é" has been transformed to their html entity equivalent "é"

My page is on UTF-8, so I don't need this transformation. Moreover, when I index this text with Hibernate Search, it indexes the word with html entities, so I can't find word "été" on my index.

How can I force antisamy to not transform special chars to their html entity equivalent ?

thanks

PS: an issue has been opened : http://code.google.com/p/owaspantisamy/issues/detail?id=99

Jerome Cance
  • 8,103
  • 12
  • 53
  • 106

4 Answers4

3

I ran into the same problem this morning.

I have encapsulated antisamy in a class and I use apache StringEscapeUtil from apache common-lang to restore special characters.

 CleanResults cleanResults = antiSamy.scan(taintedHtml);
 cleanedHtml = cleanResults.getCleanHTML();  
 return StringEscapeUtils.unescapeHtml(cleanedHtml)

The result is a cleaned up HTML without the HTML escaping of special characters.

Hope this helps.

  • It can be really unsecure, isn't it ? If I clean : "été < hiver" and after unescape, the "<" is coming back and that is dangerous... – Jerome Cance Jan 28 '11 at 15:10
  • @Jerome C., where able to solve this issue? I got it to work with this suggestion. Why do you think it's unsecure? – Mohamad Jun 15 '11 at 16:41
  • well I think unescaping html when antisamy is here to escape it can be quite dangerous because you not only transform entities but maybe encoded dangerous characters like < > " ' when you unescape – Jerome Cance Jun 17 '11 at 15:28
  • @Jerome, but if you use AntiSamy first to sanitize input, and then unescape, why would it make a difference? – Mohamad Aug 19 '11 at 17:01
  • 1
    @JeromeC. I'm just curious, what was your solution to this? AntiSamy was just updated to include special character escaping in a directive. – Mohamad Sep 20 '11 at 23:10
  • Thank you so much to update this ! I have no solution for the moment but it seems that "entityEncodeIntlChars" will definitively solve this. I haven't tested it yet, but I will give it a try in few weeks. – Jerome Cance Sep 22 '11 at 15:10
2

Like Mohamad said it in a comment, Antisamy has just released a new directive named : entityEncodeIntlChars

here is the detail : http://code.google.com/p/owaspantisamy/source/detail?r=240

It seems that this directive solves the problem.

Jerome Cance
  • 8,103
  • 12
  • 53
  • 106
  • Yes, DO NOT use the StringEscapeUtils.unescapeHtml(cleanedHtml) as it opens you up for XSS attacks as explained in the comments. Unescaping will not only unescape the entities but also any escaped HTML existing in the data. – Erlend Aug 14 '12 at 09:55
0

After scouring the AntiSamy source code, I found no way of changing this behavior apart from modifying AntiSamy.

0

Check out this one: http://code.google.com/p/owaspantisamy/source/browse/#svn/trunk/dotNet/current/source/owaspantisamy/html/scan

Grab the source and notice that key classes (AntiSamyDOMScanner, CleanResults) use standard framework objects (like XmlDocument). Compile and run with the binary you compiled - so that you can see everything in a debugger - as in which of the major classes actually corrupts your data. With that in hand you'll be able to either change a few properties on major objects to make it stop or inject your own post-processing to revert the wrongdoing (say with a regexp). Latter you can expose that as additional top-level property, say one named NoMess :-)

Chances are that behavior in that respect is different between languages (there's 3 in that trunk) but the same tactics will work no matter which one you have to deal with.

ZXX
  • 4,684
  • 27
  • 35
  • I've thinking of that but that's very tricky because I use Maven and so, I never download entire project source to modify it. That's strange that Antisamy does not implement such a feature. I think I will do it but it's really ugly. – Jerome Cance Aug 27 '10 at 07:53
  • Well it's a good excuse to start decoupling :-) Might also be the matter of "assumed" output encoding. – ZXX Aug 27 '10 at 09:10