[Fixed]-Using safe filter in Django for rich text fields

15👍

Use django-bleach. This provides you with a bleach template filter that allows you to filter out just the tags you want:

{% load bleach_tags %}
{{ mymodel.my_html_field|bleach }}

The trick is to configure the editor to produce the same tags as you’re willing to ‘let through’ in your bleach settings.

Here’s an example of my bleach settings:

# Which HTML tags are allowed
BLEACH_ALLOWED_TAGS = ['p', 'h3', 'h4', 'em', 'strong', 'a', 'ul', 'ol', 'li', 'blockquote']
# Which HTML attributes are allowed
BLEACH_ALLOWED_ATTRIBUTES = ['href', 'title', 'name']
BLEACH_STRIP_TAGS = True

Then you can configure TinyMCE (or whatever WYSIWYG editor you’re using) only to have the buttons that create the allowed tags.

14👍

You are right to be concerned about raw HTML, but not just for Javascript-disabled browsers. When considering the security of your server, you have to ignore any work done in the browser, and look solely at what the server accepts and what happens to it. Your server accepts HTML and displays it on the page. This is unsafe.

The fact that TinyMce quotes HTML is a false security: the server trusts what it accepts, which it should not.

The solution to this is to process the HTML when it arrives, to remove dangerous constructs. This is a complicated problem to solve. Take a look at the XSS Cheat Sheet to see the wide variety of inputs that could cause a problem.

lxml has a function to clean HTML: http://lxml.de/lxmlhtml.html#cleaning-up-html, but I’ve never used it, so I can’t vouch for its quality.

6👍

You can use the template filter “removetags” and just remove ‘script’.

Note that removetags has been removed from Django 2.0. Here is the deprecation notice from the docs:

Deprecated since version 1.8: removetags cannot guarantee HTML safe output and has been deprecated due to security concerns. Consider using bleach instead.

3👍

There isn’t a good answer to this one. TinyMCE generates HTML, and django’s auto-escape specifically removes HTML.

The traditional solution to this problem has been to either use some non-html markup language in the user input side (bbcode, markdown, etc.) or to whitelist a limited number of HTML tags. TinyMCE/HTML are generally only appropriate input solutions for more or less trusted users.

The whitelist approach is tricky to implement without any security holes. The one thing you don’t want to do is try to just detect “bad” tags – you WILL miss edge cases.

Leave a comment