strip_tags like php one

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def strip_tags(text, valid_tags={}):
    from BeautifulSoup import BeautifulSoup, Comment
    
    soup = BeautifulSoup(text)
    for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
        comment.extract()
    for tag in soup.findAll(True):
        if tag.name in valid_tags:
            valid_attrs = valid_tags[tag.name]
            tag.attrs = [(attr, val.replace('javascript:', '')) 
                for attr, val in tag.attrs if attr in valid_attrs]
        else:
            tag.hidden = True
    return soup.renderContents().decode('utf8')

More like this

  1. template code for "Google v3 geocoding for Geodjango admin site" by samhag 1 year, 5 months ago
  2. google.html template for GoogleAdmin by jbronn 5 years, 6 months ago
  3. google.js template for GoogleAdmin by jbronn 5 years, 6 months ago
  4. Sanitize HTML filter by henriklied 6 years, 11 months ago
  5. Admin related widget wrapper with edit / delete link (html) by nasp 2 years, 6 months ago

Comments

simon (on May 4, 2010):

This snippet is not enough to protect against malicious input from users - for example, a URL with an href of javascript : alert('evil') would bypass the filter here and would probably still work in most browsers. Sanitising HTML is a very, very hard problem with an awful lot of edge cases - I'm sure there are plenty of other holes in the above code.

#

homm (on May 4, 2010):

No one my browser run «javascript : alert('evil')» as javascript. All of they try open file with same name.

#

(Forgotten your password?)