10

Python-Markdown includes features like escaping of raw HTML that are obviously intended to make it safe on untrusted input, and generally speaking Markdown is commonly used for rendering user input, such as right here on SO.

But is this implementation really trustworthy? Has anyone here studied it to decide it's safe to run on arbitrary input?

I see there is eg Markdown in Django XSS safe and Secure Python Markdown Library but is 'safe' mode really safe?

Community
  • 1
  • 1
poolie
  • 9,289
  • 1
  • 47
  • 74
  • 1
    Do you have any reason to believe that it isn't? The safe mode in Django should work perfectly if you supply it correctly as it's a fairly mature framework and a lot of people use it. If there were any obvious security risks, people would have found them by now. – Codahk Dec 07 '11 at 01:35
  • 4
    it's safe. but whether it's "really safe" depends on your definition of "really safe" – Dmitry B. Dec 07 '11 at 01:35

2 Answers2

5

The Python Markdown library appears to be safe as far as anyone knows, if you use it properly. See the link for details about how to use it safely, but the short version is: it is important to use the latest version, to set safe_mode, and to set enable_attributes=False.

Update: safe_mode is now due to be deprecated, because of the security problems with it. See https://github.com/Python-Markdown/markdown/commit/7db56daedf8a6006222f55eeeab748e7789fba89. Instead, use a separate HTML sanitizer, such as HTML Purifier.

D.W.
  • 3,382
  • 7
  • 44
  • 110
  • 1
    safe_mode has been deprecated https://pythonhosted.org/Markdown/reference.html#safe_mode – Richard Jones Dec 08 '14 at 04:48
  • That link is now broken, but https://github.com/Python-Markdown/markdown/commit/7db56daedf8a6006222f55eeeab748e7789fba89 explains the deprecation. Anyhow, I'm glad I asked! – poolie Apr 11 '19 at 04:25
  • @D.W. I think this answer, although helpful at the time, is now outdated. The discussion around that deprecation seems to be that it's actually not safe against untrusted input and you need to use other protection. – poolie Apr 13 '19 at 00:43
  • @poolie, You raise a good point. That sounds plausible, but I have a query: do you know what the actual security problem is? I haven't been able to find any information on that. If I can find evidence of that, I'll replace my answer with something recommending alternative HTML sanitizers. – D.W. Apr 13 '19 at 05:30
  • It's vulnerable to XSS: markdown entered by an attacker and rendered to another user could run Javascript in your site's context, or have other side effects. [Here's a general discussion of XSS in markdown](https://michelf.ca/blog/2010/markdown-and-xss/), and the advice there of scrubbing the rendered output is consistent with the new Python-Markdown docs. So, perhaps it could be said that Python-Markdown is insecure only in the same way as any renderer: they should all have the output scrubbed. – poolie Apr 13 '19 at 05:47
  • @poolie, Thanks for your comments. I'm still confused. That blog entry doesn't mention safe mode. It gives two example attacks. The first example attack doesn't seem a problem for safe mode. As far as I know, the second attack no longer works (browsers don't interpret `javascript:` urls any longer), and I think it might be blocked by `enable_attributes=False`. So can you explain how you know that safe mode is vulnerable? I'm not seeing it yet. – D.W. Apr 13 '19 at 06:36
0

You can use bleach

import bleach

text = "<a href='https://example.com'>Example</a><script>alert('message');</script>"

sanitized_text = bleach.clean(text,
            tags=['p','a','code','pre','blockquote'],
            attributes={'code': ['class'],'a': ['href']}
)

Read documentation for more.