8👍
You need to parse the HTML on the server and remove any tags and attributes that don’t meet a strict whitelist.
You should parse it (or at least re-render it) as strict XML to prevent attackers from exploiting differences between fuzzy parsers.
The whitelist must not include <script>
, <style>
, <link>
, or <meta>
, and must not include event handler attributes or style=""
.
You must also parse URLs in href=""
and src=""
and make sure that they are either relative paths, http://
, or https://
.
16👍
This is late, but you can try Bleach, under the hood it uses the html5lib, and you’ll also get tag balancing.
Here is a complete snippet:
settings.py
BLEACH_VALID_TAGS = ['p', 'b', 'i', 'strike', 'ul', 'li', 'ol', 'br',
'span', 'blockquote', 'hr', 'a', 'img']
BLEACH_VALID_ATTRS = {
'span': ['style', ],
'p': ['align', ],
'a': ['href', 'rel'],
'img': ['src', 'alt', 'style'],
}
BLEACH_VALID_STYLES = ['color', 'cursor', 'float', 'margin']
app/forms.py
import bleach
from django.conf import settings
class MyModelForm(forms.ModelForm):
myfield = forms.CharField(widget=MyWYSIWYGEditor)
class Meta:
model = MyModel
def clean_myfield(self):
myfield = self.cleaned_data.get('myfield', '')
cleaned_text = bleach.clean(myfield, settings.BLEACH_VALID_TAGS, settings.BLEACH_VALID_ATTRS, settings.BLEACH_VALID_STYLES)
return cleaned_text #sanitize html
You can read the bleach docs, so you can adapt it to your needs.
- How can create a json tree from django-mptt?
- Filter on django-import-export
- How to thumbnail static files?
- Can you achieve a case insensitive 'unique' constraint in Sqlite3 (with Django)?
- Django: Implementing a referral program
1👍
Adding to Nitely’s answer which was great but slightly incomplete: I also recommend using Bleach, but if you want to use it to pre-approve safe CSS styles you need to use Bleach CSS Sanitizer (separate pip install to the vanilla bleach package), which makes for a slightly different code set-up to Nitely’s.
We use the below in our Django project forms.py file (using Django-CKEditor as the content widget) to sanitize the data for our user-input ReportPages.
import bleach
from bleach.css_sanitizer import CSSSanitizer
from django.conf import settings
css_sanitizer = CSSSanitizer(allowed_css_properties=settings.BLEACH_VALID_STYLES)
class ReportPageForm(forms.ModelForm):
content = forms.CharField(widget=CKEditorWidget())
class Meta:
model = ReportPage
fields = ('name', 'content')
def clean_content(self):
content = self.cleaned_data['content']
cleaned_content = bleach.clean(
content,
tags=settings.BLEACH_VALID_TAGS,
attributes=settings.BLEACH_VALID_ATTRS,
protocols=settings.BLEACH_VALID_PROTOCOLS,
css_sanitizer=css_sanitizer,
strip=True
)
We include strip=True to remove mark-up that is escaped from the form content. We also include protocols so that any href attrs (for ‘a’ tags) and src attrs (for ‘img’ tags) must be https (http and mailto are enabled by default, which we wanted turned off).
For completeness’ sake, inside our settings.py file we define the following as valid mark-up for our purposes:
BLEACH_VALID_TAGS = (
'a', 'abbr', 'acronym', 'b', 'blockquote', 'br', 'code',
'dd', 'div', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
'hr', 'i', 'img', 'li', 'ol', 'p', 'pre', 'span', 'strike',
'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th',
'thead', 'tr', 'tt', 'u', 'ul'
)
BLEACH_VALID_ATTRS = {
'*': ['style', ], # allow all tags to have style attr
'p': ['align', ],
'a': ['href', 'rel'],
'img': ['src', 'alt', 'style'],
}
BLEACH_VALID_STYLES = (
'azimuth', 'background-color', 'border', 'border-bottom-color',
'border-collapse', 'border-color', 'border-left-color',
'border-right-color', 'border-top-color', 'clear',
'color','cursor', 'direction', 'display', 'elevation', 'float',
'font', 'font-family','font-size', 'font-style', 'font-variant',
'font-weight', 'height', 'letter-spacing', 'line-height',
'margin', 'margin-bottom', 'margin-left', 'margin-right',
'margin-top', 'overflow', 'padding', 'padding-bottom',
'padding-left', 'padding-right', 'padding-top', 'pause',
'pause-after', 'pause-before', 'pitch', 'pitch-range',
'richness', 'speak', 'speak-header', 'speak-numeral',
'speak-punctuation', 'speech-rate', 'stress', 'text-align',
'text-decoration', 'text-indent', 'unicode-bidi',
'vertical-align', 'voice-family', 'volume', 'white-space', 'width'
)
BLEACH_VALID_PROTOCOLS = ('https',)
- Django – Objects for business hours
- Django MakeMessages missing xgettext in Windows
- Django forms: how to dynamically create ModelChoiceField labels
- Django Test Client and Subdomains
- Does django staticfiles skip the middleware?
0👍
@SLaks is right that you need to do the sanitization on the server since students who steal a teacher’s credentials could use those credentials to POST directly to your server.
Python HTML sanitizer / scrubber / filter discusses existing HTML sanitizers available for python.
I would suggest starting with an empty white-list, then use the WYSIWYG editor to create a snippet of HTML using each button so that you know the varieties of HTML it produces, and then whitelist only the tags/attributes needed to support the HTML it produces. Hopefully it doesn’t use the CSS style
attribute because those can also be an XSS vector.
- Avoid recursive save() when using celery to update Django model fields
- How to call asynchronous function in Django?