38

Is there a generic "form sanitizer" that I can use to ensure all html/scripting is stripped off the submitted form? form.clean() doesn't seem to do any of that - html tags are all still in cleaned_data. Or actually doing this all manually (and override the clean() method for the form) is my only option?

djvg
  • 11,722
  • 5
  • 72
  • 103
abolotnov
  • 4,282
  • 9
  • 56
  • 88

3 Answers3

55

strip_tags actually removes the tags from the input, which may not be what you want.

To convert a string to a "safe string" with angle brackets, ampersands and quotes converted to the corresponding HTML entities, you can use the escape filter:

from django.utils.html import escape
message = escape(form.cleaned_data['message'])
djvg
  • 11,722
  • 5
  • 72
  • 103
simao
  • 14,491
  • 9
  • 55
  • 66
36

Django comes with a template filter called striptags, which you can use in a template:

value|striptags

It uses the function strip_tags which lives in django.utils.html. You can utilize it also to clean your form data:

from django.utils.html import strip_tags
message = strip_tags(form.cleaned_data['message'])
Bernhard Vallant
  • 49,468
  • 20
  • 120
  • 148
  • 6
    "Note that strip_tags result may still contain unsafe HTML content, so you might use escape() to make it a safe string." - https://docs.djangoproject.com/en/dev/ref/utils/#django.utils.html.strip_tags – Collin Anderson Jan 22 '14 at 22:22
  • Strip_tags() alone is insufficient, and the strip_tags() + escape() combination makes for really ugly text-- especially where it legitimately contains apostrophes. Just use bleach.clean(). – Ivan May 28 '17 at 00:38
  • 3
    Is there any reason to strip tags? If a user is submitting tag like stuff it is probably better to escape it but keep it looking like the input. Say I enter stuff like `bad joke` – AnnanFay Feb 14 '18 at 05:26
35

Alternatively, there is a Python library called bleach:

Bleach is a whitelist-based HTML sanitization and text linkification library. It is designed to take untrusted user input with some HTML.

Because Bleach uses html5lib to parse document fragments the same way browsers do, it is extremely resilient to unknown attacks, much more so than regular-expression-based sanitizers.

Example:

import bleach
message = bleach.clean(form.cleaned_data['message'], 
                       tags=ALLOWED_TAGS,
                       attributes=ALLOWED_ATTRIBUTES, 
                       styles=ALLOWED_STYLES, 
                       strip=False, strip_comments=True)
Community
  • 1
  • 1
Wtower
  • 18,848
  • 11
  • 103
  • 80