Login

Sanitize text field HTML (here from the Dojo Toolkit Editor2 widget)

Author:
akaihola
Posted:
April 10, 2007
Language:
Python
Version:
.96
Tags:
forms html wysiwyg dojo security sanitize
Score:
2 (after 2 ratings)

When using a JavaScript WYSIWYG editor widget for text area content, the resulting HTML should be sanitized so no unallowed HTML tags (esp. script tags) are present.

The BeautifulSoup library handles HTML processing in the solution presented above, so you should place it in the Python path.

The snippet also assumes that you have the Dojo Toolkit and its Editor2 widget loaded on your page.

Note: this snippet was originally written for use with Dojo Toolkit 0.4, and it hasn't been updated for 0.9 or 1.0.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from django import newforms as forms
from BeautifulSoup import BeautifulSoup, Comment

class Editor2Field(forms.CharField):

    widget=forms.widgets.Textarea(attrs={'dojoType': 'Editor2'})

    valid_tags = 'p i strong b u a h1 h2 h3 pre br img'.split()
    valid_attrs = 'href src'.split()

    def clean(self, value):
        """
        Cleans non-allowed HTML from the input.
        """
        value = super(Editor2Field, self).clean(value)
        soup = BeautifulSoup(value)
        for comment in soup.findAll(
            text=lambda text: isinstance(text, Comment)):
            comment.extract()
        for tag in soup.findAll(True):
            if tag.name not in self.valid_tags:
                tag.hidden = True
            tag.attrs = [(attr, val) for attr, val in tag.attrs
                         if attr in self.valid_attrs]
        return soup.renderContents().decode('utf8')


class TestForm(forms.Form):
    title = forms.CharField()
    content = Editor2Field()

More like this

  1. Sanitize HTML filter with tag/attribute whitelist and XSS protection by harrym 5 years, 10 months ago
  2. urlize HTML by maguspk 4 years, 11 months ago
  3. TinyMCE Widget by semente 5 years, 9 months ago
  4. Sanitize HTML filter by henriklied 8 years, 1 month ago
  5. Revisiting Pygments and Markdown by djypsy 7 years, 9 months ago

Comments

guettli (on November 16, 2007):

Nice snippet!

#

marcink (on February 10, 2008):

This is nice, but you should also look into href attributes to make sure they don't contain javascript code.

#

akaihola (on April 21, 2008):

marcink: Thanks for the heads up. It's obviously a fatal mistake to have left out that check.

#

Please login first before commenting.