Sanitize HTML filter with tag/attribute whitelist and XSS protection

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from django import template
from BeautifulSoup import BeautifulSoup, Comment
import re

register = template.Library()

def sanitize(value, allowed_tags):
    """Argument should be in form 'tag2:attr1:attr2 tag2:attr1 tag3', where tags
    are allowed HTML tags, and attrs are the allowed attributes for that tag.
    """
    js_regex = re.compile(r'[\s]*(&#x.{1,7})?'.join(list('javascript')))
    allowed_tags = [tag.split(':') for tag in allowed_tags.split()]
    allowed_tags = dict((tag[0], tag[1:]) for tag in allowed_tags)

    soup = BeautifulSoup(value)
    for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
        comment.extract()

    for tag in soup.findAll(True):
        if tag.name not in allowed_tags:
            tag.hidden = True
        else:
            tag.attrs = [(attr, js_regex.sub('', val)) for attr, val in tag.attrs
                         if attr in allowed_tags[tag.name]]

    return soup.renderContents().decode('utf8')

register.filter(sanitize)

More like this

  1. Sanitize HTML filter by henriklied 6 years, 12 months ago
  2. Sanitize text field HTML (here from the Dojo Toolkit Editor2 widget) by akaihola 7 years ago
  3. CleanCharField by DvD 6 years, 6 months ago
  4. keeptags: strip all HTML tags from output except a specified list of elements by chrominance 6 years, 10 months ago
  5. DaGood breadcrumbs by drozzy 5 years, 3 months ago

Comments

ronnie (on May 20, 2011):

This script does not protect to XXS attacks

Try the following string: <script><script type="text/javascript">alert("ok");<</script>/script>

It results in: <script type="text/javascript">alert("ok");</script>

#

(Forgotten your password?)