2

I store html data in a database.

The html data is very simple, and is generated by a wysiwyg editor.

Before I store the html data in the database and I run it through HTMLPurifier, to remove any badness.

When I output data back out to the browser, because it is html data, obviously I cannot use php's htmlspecialchars().

I am wondering if there are any problems with this as far as XSS attacks are concerned. Is passing the data through HTMLPurifier before saving in the database enough? Are there any things I am missing / other steps I should be taking?

Thanks (in advance) for your help.

Sripathi Krishnan
  • 30,948
  • 4
  • 76
  • 83

3 Answers3

1

What you are doing is correct. You may also consider filtering on the way just to be sure. You mentioned you are using HTMLPurifier - which is great. Just don't ever try to implement a sanitizer on your own, there are lots of pitfalls in that approach.

Sripathi Krishnan
  • 30,948
  • 4
  • 76
  • 83
0

I've never had an issue with mainstream richtext editors.

XSS happen when people are able to embed raw html into your page using web forms, the input of which you output at a later date (so always encode user input when writing to screen).

This can't happen with a (good) text editor. If a user types in html code (e.g. < or >), the text editor will encode it anyway. The only tags it will create are its own.

jenson-button-event
  • 18,101
  • 11
  • 89
  • 155
  • 1
    Not true. There will usually be an AJAX call or a form submit to send the html to the server. An attacker can easily modify the html to contain arbitrary javascript code. If you don't filter this on the server side, you have a stored XSS problem. – Sripathi Krishnan Feb 26 '11 at 18:13
-1

There is a function htmlspecialchars, that will encode characters into their html equivalent. For example < becomes &lt;

In addition you may want to clean out any suspicious tags. I wrote a short js function a while ago to do this for a project (by no means all-inclusive!) You may want to take this and edit it for your needs, or base your own off of it...

    <script language="javascript" type="text/javascript">

    function Button1_onclick() {
        //get text
        var text = document.getElementById("txtIn").value;
        //wype it
        text = wype(text);
        //give it back
        document.getElementById("txtOut").value = text;
    }

    function wype(text) {
        text = script(text);
        text = regex(text);
        return text
    }


    function script(text) {
        var re1 = new RegExp('<script.*?>.*?</scri'+'pt>', 'g');
        text = text.replace(re1, '');
        return text
    }

    function regex(text) {
        var tags = ["html", "body", "head", "!doctype", "script", "embed", "object", "frameset", "frame", "iframe", "meta", "link", "div", "title", "w", "m", "o", "xml"];
        for (var x = 0; x < tags.length; x++) {
            var tag = tags[x];
            var re = new RegExp('<' + tag + '[^><]*>|<.' + tag + '[^><]*>', 'g');
            text = text.replace(re, '');
        }
        return text;
    }
</script>
Michael Jasper
  • 7,962
  • 4
  • 40
  • 60
  • HTMLSpecialChars doesn't work in this case, because the OP wants the html to render and not display as text. Also, do not recommend writing a html filter/purifier, especially one that is client-side-only and blacklist based; there are several ways to evade such a filter. Try this cheat sheet - http://ha.ckers.org/xss.html – Sripathi Krishnan Feb 26 '11 at 18:25