Login

strip_tags like php one

Author:
homm
Posted:
April 29, 2010
Language:
Python
Version:
1.1
Tags:
tags html
Score:
0 (after 0 ratings)

Usage:

clean_html = strip_tags(html, {'a': ['href'], 'p': ['class']})

Based on another snippet.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def strip_tags(text, valid_tags={}):
    from BeautifulSoup import BeautifulSoup, Comment
    
    soup = BeautifulSoup(text)
    for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
        comment.extract()
    for tag in soup.findAll(True):
        if tag.name in valid_tags:
            valid_attrs = valid_tags[tag.name]
            tag.attrs = [(attr, val.replace('javascript:', '')) 
                for attr, val in tag.attrs if attr in valid_attrs]
        else:
            tag.hidden = True
    return soup.renderContents().decode('utf8')

More like this

  1. template code for "Google v3 geocoding for Geodjango admin site" by samhag 2 years, 10 months ago
  2. google.html template for GoogleAdmin by jbronn 6 years, 10 months ago
  3. google.js template for GoogleAdmin by jbronn 6 years, 10 months ago
  4. Sanitize HTML filter by henriklied 8 years, 4 months ago
  5. Admin related widget wrapper with edit / delete link (html) by nasp 3 years, 10 months ago

Comments

simon (on May 4, 2010):

This snippet is not enough to protect against malicious input from users - for example, a URL with an href of javascript : alert('evil') would bypass the filter here and would probably still work in most browsers. Sanitising HTML is a very, very hard problem with an awful lot of edge cases - I'm sure there are plenty of other holes in the above code.

#

homm (on May 4, 2010):

No one my browser run «javascript : alert('evil')» as javascript. All of they try open file with same name.

#

Please login first before commenting.