15👍
Use django-bleach. This provides you with a bleach
template filter that allows you to filter out just the tags you want:
{% load bleach_tags %}
{{ mymodel.my_html_field|bleach }}
The trick is to configure the editor to produce the same tags as you’re willing to ‘let through’ in your bleach settings.
Here’s an example of my bleach settings:
# Which HTML tags are allowed
BLEACH_ALLOWED_TAGS = ['p', 'h3', 'h4', 'em', 'strong', 'a', 'ul', 'ol', 'li', 'blockquote']
# Which HTML attributes are allowed
BLEACH_ALLOWED_ATTRIBUTES = ['href', 'title', 'name']
BLEACH_STRIP_TAGS = True
Then you can configure TinyMCE (or whatever WYSIWYG editor you’re using) only to have the buttons that create the allowed tags.
14👍
You are right to be concerned about raw HTML, but not just for Javascript-disabled browsers. When considering the security of your server, you have to ignore any work done in the browser, and look solely at what the server accepts and what happens to it. Your server accepts HTML and displays it on the page. This is unsafe.
The fact that TinyMce quotes HTML is a false security: the server trusts what it accepts, which it should not.
The solution to this is to process the HTML when it arrives, to remove dangerous constructs. This is a complicated problem to solve. Take a look at the XSS Cheat Sheet to see the wide variety of inputs that could cause a problem.
lxml has a function to clean HTML: http://lxml.de/lxmlhtml.html#cleaning-up-html, but I’ve never used it, so I can’t vouch for its quality.
6👍
You can use the template filter “removetags” and just remove ‘script’.
Note that removetags
has been removed from Django 2.0. Here is the deprecation notice from the docs:
Deprecated since version 1.8:
removetags
cannot guarantee HTML safe output and has been deprecated due to security concerns. Consider usingbleach
instead.
- Django order_by a property
- How to filter objects by ignoring upper and lower case letter django
- How to limit field access on a model based on user type on Graphene/Django?
3👍
There isn’t a good answer to this one. TinyMCE generates HTML, and django’s auto-escape specifically removes HTML.
The traditional solution to this problem has been to either use some non-html markup language in the user input side (bbcode, markdown, etc.) or to whitelist a limited number of HTML tags. TinyMCE/HTML are generally only appropriate input solutions for more or less trusted users.
The whitelist approach is tricky to implement without any security holes. The one thing you don’t want to do is try to just detect “bad” tags – you WILL miss edge cases.