djangosnippets: strip_tags like php one

Author:: homm
Posted:: April 29, 2010
Language:: Python
Version:: 1.1
Score:: 0 (after 0 ratings)

Download
Raw

Usage:

clean_html = strip_tags(html, {'a': ['href'], 'p': ['class']})

def strip_tags(text, valid_tags={}):
    from BeautifulSoup import BeautifulSoup, Comment
    
    soup = BeautifulSoup(text)
    for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
        comment.extract()
    for tag in soup.findAll(True):
        if tag.name in valid_tags:
            valid_attrs = valid_tags[tag.name]
            tag.attrs = [(attr, val.replace('javascript:', '')) 
                for attr, val in tag.attrs if attr in valid_attrs]
        else:
            tag.hidden = True
    return soup.renderContents().decode('utf8')

Comments

simon (on May 4, 2010):

This snippet is not enough to protect against malicious input from users - for example, a URL with an href of javascript : alert('evil') would bypass the filter here and would probably still work in most browsers. Sanitising HTML is a very, very hard problem with an awful lot of edge cases - I'm sure there are plenty of other holes in the above code.

homm (on May 4, 2010):

No one my browser run «javascript : alert('evil')» as javascript. All of they try open file with same name.

Please login first before commenting.

strip_tags like php one

More like this

Comments