strip_tags like php one

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def strip_tags(text, valid_tags={}):
    from BeautifulSoup import BeautifulSoup, Comment
    
    soup = BeautifulSoup(text)
    for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
        comment.extract()
    for tag in soup.findAll(True):
        if tag.name in valid_tags:
            valid_attrs = valid_tags[tag.name]
            tag.attrs = [(attr, val.replace('javascript:', '')) 
                for attr, val in tag.attrs if attr in valid_attrs]
        else:
            tag.hidden = True
    return soup.renderContents().decode('utf8')

More like this

  1. improved sortby template tag by gmandx 3 years, 10 months ago
  2. Logging solution for mod_python/FCGI by mikeivanov 5 years, 8 months ago
  3. YAAS (Yet Another Auto Slug) by carljm 4 years, 12 months ago
  4. Smart i18n date diff (twitter like) by Batiste 4 years, 1 month ago
  5. Sanitize HTML filter with tag/attribute whitelist and XSS protection by harrym 3 years, 9 months ago

Comments

simon (on May 4, 2010):

This snippet is not enough to protect against malicious input from users - for example, a URL with an href of javascript : alert('evil') would bypass the filter here and would probably still work in most browsers. Sanitising HTML is a very, very hard problem with an awful lot of edge cases - I'm sure there are plenty of other holes in the above code.

#

homm (on May 4, 2010):

No one my browser run «javascript : alert('evil')» as javascript. All of they try open file with same name.

#

(Forgotten your password?)