Login

filter for extracting a number of paragraphs from any HTML code

Author:
rafadev
Posted:
June 10, 2011
Language:
Python
Version:
1.2
Score:
1 (after 1 ratings)

From: incredible times

With inspiration from: Unethical Blogger

This code parses any provided HTML content, and extracts a number of paragraphs specified, with all the content and tags inside them.

Example: Template variable "content" contains:

<a href="#>some text</a>
<p><strong>Testing</strong>testing testing this is a tester's life</p>
<div>I wont see the world</div>
<p>Another paragraph</p>

So, place this code in any loaded template module (inside a templatetags folder of your app... i.e. myapp/templatetags/myutils.py)

{% load myutils %}
{{ content|paragraphs:"1"}}

Would return:

<p><strong>Testing</strong>testing testing this is a tester's life</p>

Whereas

{% load myutils %}
{{ content|paragraphs:"2"}}

Returns:

<p><strong>Testing</strong>testing testing this is a tester's life</p>
<p>Another paragraph</p>
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
@register.filter
def paragraphs(var, arg):
    """ Retrieves n number of paragraphs from the supplied text. It doesn't remove 
    any existing tags inside paragraphs."""
    class ParagraphParser(HTMLParser):
        def __init__(self, *args, **kwargs):
            HTMLParser.__init__(self)
            self.stack = []
            self.paragraphs = int(arg)
            self.in_p = False
            self.p_count = 0
            
        def handle_starttag(self, tag, attrs):   
            if tag == 'p':
                if self.p_count < self.paragraphs:
                    self.in_p = True
                    self.p_count += 1
                else:
                    self.in_p = False
            
            if self.in_p:
                self.stack.append(self.__html_start_tag(tag, attrs))


        def handle_endtag(self, tag):
            if self.in_p:
                self.stack.append(u"</%s>" % (tag))
                if tag == 'p':
                    self.in_p = False
        
        def handle_startendtag(self, tag, attrs):
            if self.in_p:
                self.stack.append(self.__html_startend_tag(tag, attrs))

        def handle_data(self, data):
            if self.in_p:
                self.stack.append(data)

        def __html_attrs(self, attrs):
            _attrs = u""
            if attrs:
                _attrs = u" %s" % (' '.join([('%s="%s"' % (k,v)) for k,v in attrs.iteritems()]))
            return _attrs

        def __html_start_tag(self, tag, attrs):
            return u"<%s%s>" % (tag, self.__html_attrs(attrs)) 
        
        def __html_startend_tag(self, tag, attrs):
            return "<%s%s/>" % (tag, self.__html_attrs(attrs))

        def render(self):
            return u"".join(self.stack)


    parseme = ParagraphParser()
    
    try:
        parseme.feed(var)
    except HTMLParseError:
        return var
    
    return parseme.render()

# make sure output is not escaped... it contains HTML!
paragraphs.is_safe = True

More like this

  1. Template tag - list punctuation for a list of items by shapiromatron 11 months, 2 weeks ago
  2. JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 11 months, 3 weeks ago
  3. Serializer factory with Django Rest Framework by julio 1 year, 6 months ago
  4. Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 7 months ago
  5. Help text hyperlinks by sa2812 1 year, 8 months ago

Comments

Please login first before commenting.