Login

Sanitize HTML filter with tag/attribute whitelist and XSS protection

Author:
harrym
Posted:
July 27, 2009
Language:
Python
Version:
1.0
Tags:
html security sanitize whitelist
Score:
0 (after 2 ratings)

Reworked version of this snippet that now accepts an argument so the user can specify which tags to allow, and which attributes should be allowed for each tag. Argument should be in form tag2:attr1:attr2 tag2:attr1 tag3, where tags are allowed HTML tags, and attrs are the allowed attributes for that tag.

It also uses code from this post on stack overflow to add XSS protection.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from django import template
from BeautifulSoup import BeautifulSoup, Comment
import re

register = template.Library()

def sanitize(value, allowed_tags):
    """Argument should be in form 'tag2:attr1:attr2 tag2:attr1 tag3', where tags
    are allowed HTML tags, and attrs are the allowed attributes for that tag.
    """
    js_regex = re.compile(r'[\s]*(&#x.{1,7})?'.join(list('javascript')))
    allowed_tags = [tag.split(':') for tag in allowed_tags.split()]
    allowed_tags = dict((tag[0], tag[1:]) for tag in allowed_tags)

    soup = BeautifulSoup(value)
    for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
        comment.extract()

    for tag in soup.findAll(True):
        if tag.name not in allowed_tags:
            tag.hidden = True
        else:
            tag.attrs = [(attr, js_regex.sub('', val)) for attr, val in tag.attrs
                         if attr in allowed_tags[tag.name]]

    return soup.renderContents().decode('utf8')

register.filter(sanitize)

More like this

  1. Sanitize HTML filter by henriklied 8 years, 3 months ago
  2. Sanitize text field HTML (here from the Dojo Toolkit Editor2 widget) by akaihola 8 years, 3 months ago
  3. CleanCharField by DvD 7 years, 10 months ago
  4. keeptags: strip all HTML tags from output except a specified list of elements by chrominance 8 years, 1 month ago
  5. DaGood breadcrumbs by drozzy 6 years, 6 months ago

Comments

ronnie (on May 20, 2011):

This script does not protect to XXS attacks

Try the following string: <script><script type="text/javascript">alert("ok");<</script>/script>

It results in: <script type="text/javascript">alert("ok");</script>

#

Please login first before commenting.