How to prevent XSS attacks when I need to render HTML from a WYSIWYG editor?

Question

Non-Technical Background info: I am working for a school and we are building a new website using Django. The teachers that work for the school aren't technologically competent enough to use another MarkUp language such as MarkDown. We eventually decided that we should use a WYSIWYG editor, which poses security flaws. We aren't too worried about the teachers themselves, but more malicious students that might get the teacher's credentials.

Technical Background info: We are running using Django 1.3 and have not chosen a specific editor yet. We are leaning towards a javascript one such as TINYMCE, but can be persuaded to use anything that allows security and ease of use. Because the WYSIWYG editor will output HTML to be rendered into the document, we cannot simply escape it.

What is the best way to prevent malicious code while still making it easy for non-technical teachers to write posts?

[side comment: ckEditor has some nice-looking django integrations, including image upload and browse packages. good alt to TinyMCE. Not sure if this helps the XSS problems though!] — Spacedman, Jul 26 '11 at 14:58

score 16 · Answer 1 · answered Jul 15 '12 at 00:45

This is late, but you can try Bleach, under the hood it uses the html5lib, and you'll also get tag balancing.

Here is a complete snippet:

settings.py

BLEACH_VALID_TAGS = ['p', 'b', 'i', 'strike', 'ul', 'li', 'ol', 'br',
                     'span', 'blockquote', 'hr', 'a', 'img']
BLEACH_VALID_ATTRS = {
    'span': ['style', ],
    'p': ['align', ],
    'a': ['href', 'rel'],
    'img': ['src', 'alt', 'style'],
}
BLEACH_VALID_STYLES = ['color', 'cursor', 'float', 'margin']

app/forms.py

import bleach
from django.conf import settings

class MyModelForm(forms.ModelForm):
    myfield = forms.CharField(widget=MyWYSIWYGEditor)


    class Meta:
        model = MyModel

    def clean_myfield(self):
        myfield = self.cleaned_data.get('myfield', '')
        cleaned_text = bleach.clean(myfield, settings.BLEACH_VALID_TAGS, settings.BLEACH_VALID_ATTRS, settings.BLEACH_VALID_STYLES)
        return cleaned_text #sanitize html

You can read the bleach docs, so you can adapt it to your needs.

score 8 · Accepted Answer · answered Jul 26 '11 at 13:36

You need to parse the HTML on the server and remove any tags and attributes that don't meet a strict whitelist.
You should parse it (or at least re-render it) as strict XML to prevent attackers from exploiting differences between fuzzy parsers.

The whitelist must not include <script>, <style>, <link>, or <meta>, and must not include event handler attributes or style="".

You must also parse URLs in href="" and src="" and make sure that they are either relative paths, http://, or https://.

score 0 · Answer 3 · edited May 23 '17 at 12:34

@SLaks is right that you need to do the sanitization on the server since students who steal a teacher's credentials could use those credentials to POST directly to your server.

Python HTML sanitizer / scrubber / filter discusses existing HTML sanitizers available for python.

I would suggest starting with an empty white-list, then use the WYSIWYG editor to create a snippet of HTML using each button so that you know the varieties of HTML it produces, and then whitelist only the tags/attributes needed to support the HTML it produces. Hopefully it doesn't use the CSS style attribute because those can also be an XSS vector.

KingRanTheMan · Answer 4 · 2023-01-09T11:23:46.860

Adding to Nitely's answer which was great but slightly incomplete: I also recommend using Bleach, but if you want to use it to pre-approve safe CSS styles you need to use Bleach CSS Sanitizer (separate pip install to the vanilla bleach package), which makes for a slightly different code set-up to Nitely's.

We use the below in our Django project forms.py file (using Django-CKEditor as the content widget) to sanitize the data for our user-input ReportPages.

import bleach 
from bleach.css_sanitizer import CSSSanitizer
from django.conf import settings

css_sanitizer = CSSSanitizer(allowed_css_properties=settings.BLEACH_VALID_STYLES)

class ReportPageForm(forms.ModelForm):
    content = forms.CharField(widget=CKEditorWidget())
    class Meta:
        model = ReportPage
        fields = ('name', 'content')

    def clean_content(self):
        content = self.cleaned_data['content']
        cleaned_content = bleach.clean(
            content, 
            tags=settings.BLEACH_VALID_TAGS, 
            attributes=settings.BLEACH_VALID_ATTRS, 
            protocols=settings.BLEACH_VALID_PROTOCOLS,
            css_sanitizer=css_sanitizer,
            strip=True
        )

We include strip=True to remove mark-up that is escaped from the form content. We also include protocols so that any href attrs (for 'a' tags) and src attrs (for 'img' tags) must be https (http and mailto are enabled by default, which we wanted turned off).

For completeness' sake, inside our settings.py file we define the following as valid mark-up for our purposes:

BLEACH_VALID_TAGS = (
    'a', 'abbr', 'acronym', 'b', 'blockquote', 'br', 'code', 
    'dd', 'div', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 
    'hr', 'i', 'img', 'li', 'ol', 'p', 'pre', 'span', 'strike', 
    'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 
    'thead', 'tr', 'tt', 'u', 'ul'
)
    
BLEACH_VALID_ATTRS = {
    '*': ['style', ], # allow all tags to have style attr
    'p': ['align', ],
    'a': ['href', 'rel'],
    'img': ['src', 'alt', 'style'],
}

BLEACH_VALID_STYLES = (
    'azimuth', 'background-color', 'border', 'border-bottom-color',
    'border-collapse', 'border-color', 'border-left-color',
    'border-right-color', 'border-top-color', 'clear',
    'color','cursor', 'direction', 'display', 'elevation', 'float',
    'font', 'font-family','font-size', 'font-style', 'font-variant',
    'font-weight', 'height', 'letter-spacing', 'line-height', 
    'margin', 'margin-bottom', 'margin-left', 'margin-right', 
    'margin-top', 'overflow', 'padding', 'padding-bottom', 
    'padding-left', 'padding-right', 'padding-top', 'pause', 
    'pause-after', 'pause-before', 'pitch', 'pitch-range',
    'richness', 'speak', 'speak-header', 'speak-numeral',
    'speak-punctuation', 'speech-rate', 'stress', 'text-align',
    'text-decoration', 'text-indent', 'unicode-bidi', 
    'vertical-align', 'voice-family', 'volume', 'white-space', 'width'
)

BLEACH_VALID_PROTOCOLS = ('https',)

How to prevent XSS attacks when I need to render HTML from a WYSIWYG editor?

4 Answers4