1 2 3 4 5 6 7 8 9 10 11 12 13 14 | def strip_tags(text, valid_tags={}):
from BeautifulSoup import BeautifulSoup, Comment
soup = BeautifulSoup(text)
for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
comment.extract()
for tag in soup.findAll(True):
if tag.name in valid_tags:
valid_attrs = valid_tags[tag.name]
tag.attrs = [(attr, val.replace('javascript:', ''))
for attr, val in tag.attrs if attr in valid_attrs]
else:
tag.hidden = True
return soup.renderContents().decode('utf8')
|
More like this
- improved sortby template tag by gmandx 3 years, 10 months ago
- Logging solution for mod_python/FCGI by mikeivanov 5 years, 8 months ago
- YAAS (Yet Another Auto Slug) by carljm 4 years, 12 months ago
- Smart i18n date diff (twitter like) by Batiste 4 years, 1 month ago
- Sanitize HTML filter with tag/attribute whitelist and XSS protection by harrym 3 years, 9 months ago
Comments
This snippet is not enough to protect against malicious input from users - for example, a URL with an href of javascript : alert('evil') would bypass the filter here and would probably still work in most browsers. Sanitising HTML is a very, very hard problem with an awful lot of edge cases - I'm sure there are plenty of other holes in the above code.
#
No one my browser run «javascript : alert('evil')» as javascript. All of they try open file with same name.
#