Sanitize HTML filter
Originally posted by [akaihola](http://www.djangosnippets.org/users/akaihola/) as [snippet #169](http://www.djangosnippets.org/snippets/169/). I just redid it as a filter.
- html
- sanitize
Originally posted by [akaihola](http://www.djangosnippets.org/users/akaihola/) as [snippet #169](http://www.djangosnippets.org/snippets/169/). I just redid it as a filter.
When using a JavaScript WYSIWYG editor widget for text area content, the resulting HTML should be sanitized so no unallowed HTML tags (esp. script tags) are present. The [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/) library handles HTML processing in the solution presented above, so you should place it in the Python path. The snippet also assumes that you have [the Dojo Toolkit](http://dojotoolkit.org/) and its Editor2 widget loaded on your page. **Note**: this snippet was originally written for use with Dojo Toolkit 0.4, and it hasn't been updated for 0.9 or 1.0.
Usefull for TinyMCE, to allow some HTML but be vunarable by XXS attacks You need to install html5lib sudo easy_install html5lib
Reworked version of [this snippet](http://www.djangosnippets.org/snippets/205/) that now accepts an argument so the user can specify which tags to allow, and which attributes should be allowed for each tag. Argument should be in form `tag2:attr1:attr2 tag2:attr1 tag3`, where tags are allowed HTML tags, and attrs are the allowed attributes for that tag. It also uses code from [this post on stack overflow](http://stackoverflow.com/questions/16861/sanitising-user-input-using-python) to add XSS protection.
I was about to start an online community but every time you allow people to post something as a comment you never know what they come up to, especially regarding profanities. So I come up with this idea, I put together some code from the old style form validators and the new newform style, plus some code to sanitize HTML from snippet number [169](http://www.djangosnippets.org/snippets/169/), and the final result is a CharField that only accept values without swear words, profanities, curses and bad html. Cheers.