Login

plaintext filter

Author:
onelson
Posted:
April 9, 2010
Language:
Python
Version:
1.1
Score:
1 (after 1 ratings)

Inspired by this terse blog post.

This filter was designed to simplify the stripping out of all (x)html in a given template var, while preserving some meta information from anchor, and image tags.

Why is this even useful? If you have pre-assembled portions of templates, or model fields containing html, that you want to use to populate a search index like django-haystack you can safely discard all the markup, while keeping the text that should be still searchable. Alt text, and title attributes are worth keeping!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from django import template
register = template.Library()
from django.template.defaultfilters import stringfilter
from django.utils.safestring import mark_safe
from BeautifulSoup import BeautifulSoup, Tag, NavigableString

@register.filter(name='plaintext')
@stringfilter
def plaintext(value):
    soup = BeautifulSoup(value)
    anchors = soup.findAll('a')
    for a in anchors:
        substitute = Tag(soup, 'span')
        substitute.insert(0,a.string)
        meta = []
        attrs = [k for k,v in a.attrs]
        if 'title' in attrs: meta.append(a['title'])
        if 'href' in attrs: meta.append(a['href'])
        if meta: substitute.insert(1,NavigableString(' (%s)' % ', '.join(meta)))
        a.replaceWith(substitute)
    
    images = soup.findAll('img')
    for img in images:
        substitute = Tag(soup,'span')
        meta = []
        attrs = [k for k,v in img.attrs]
        if 'src' in attrs: meta.append(img['src'])
        if 'title' in attrs: meta.append(img['title'])
        if 'alt' in attrs: meta.append(img['alt'])
        if meta: substitute.insert(0,NavigableString(' (%s)' % ', '.join(meta)))
        img.replaceWith(substitute)
    return mark_safe(''.join(soup.findAll(text=True)))
plaintext.mark_safe = True

More like this

  1. Template tag - list punctuation for a list of items by shapiromatron 2 months ago
  2. JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 2 months, 1 week ago
  3. Serializer factory with Django Rest Framework by julio 9 months, 1 week ago
  4. Image compression before saving the new model / work with JPG, PNG by Schleidens 9 months, 4 weeks ago
  5. Help text hyperlinks by sa2812 10 months, 3 weeks ago

Comments

Please login first before commenting.