Login

typygmentdown

Author:
ubernostrum
Posted:
August 22, 2007
Language:
Python
Version:
.96
Score:
7 (after 7 ratings)

Based heavily on snippet #119, this is an all-in-one function which applies Markdown and typogrify, and adds Pygments highlighting (selected from a class name or by having Pygments guess the language) to any <code> elements found in the text.

It also adds some niceties for picking up useful arguments to Markdown and Pygments, and registers itself as a markup filter with the template_utils formatter.

Requirements:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
from BeautifulSoup import BeautifulSoup
from markdown import markdown
from pygments import highlight
from pygments.formatters import HtmlFormatter
from pygments.lexers import get_lexer_by_name, guess_lexer
from template_utils.markup import formatter
from typogrify.templatetags.typogrify import typogrify

def typygmentdown(text, **kwargs):
    """
    Given a string of text using Markdown syntax, applies the
    following transformations:
    
    1. Searches out and temporarily removes any raw ``<code>``
       elements in the text.
    2. Applies Markdown and typogrify to the remaining text.
    3. Applies Pygments highlighting to the contents of the removed
       ``<code>`` elements.
    4. Re-inserts the ``<code>`` elements and returns the result.
    
    The Pygments lexer to be used for highlighting is determined by
    the ``class`` attribute of each ``<code>`` element found; if none
    is present, it will attempt to guess the correct lexer before
    falling back on plain text.
    
    The following keyword arguments are understood and passed to
    markdown if found:
    
    * ``extensions``
    
    Markdown's ``safe_mode`` argument is *not* passed on, because it
    would cause the temporary ``<code>`` elements in the text to be
    escaped.
    
    The following keyword arguments are understood and passed to
    Pygments if found:
    
    * ``linenos``
    
    The removal, separate highlighting and re-insertion of the
    ``<code>`` elements is necessary because Markdown and SmartyPants
    do not reliably avoid formatting text inside these elements;
    removing them before applying Markdown and typogrify means they
    are in no danger of having extraneous HTML or fancy punctuation
    inserted by mistake.
    
    Original implementation by user 'blinks' as snippet #119 at
    djangosnippets: http://www.djangosnippets.org/snippets/119/. This
    version makes the following changes:

    * The name of the function is now ``typygmentdown``.
    * The argument signature has changed to work better with the
      ``template_utils`` formatter.
    * The ``extensions`` and ``linenos`` arguments are looked for and
      passed to Markdown and Pygments, respectively.
    * The function is registered with the ``template_utils``
      formatter.
    
    """
    soup = BeautifulSoup(str(text))
    code_blocks = soup.findAll('code')
    for block in code_blocks:
        block.replaceWith('<code class="removed"></code>')
    htmlized = typogrify(markdown(str(soup), extensions=kwargs.get('extensions', [])))
    soup = BeautifulSoup(htmlized)
    empty_code_blocks, index = soup.findAll('code', 'removed'), 0
    formatter = HtmlFormatter(cssclass='typygmentdown', linenos=kwargs.get('linenos', False))
    for block in code_blocks:
        if block.has_key('class'):
            language = block['class']
        else:
            language = 'text'
        try:
            lexer = get_lexer_by_name(language, stripnl=True, encoding='UTF-8')
        except ValueError, e:
            try:
                lexer = guess_lexer(block.renderContents())
            except ValueError, e:
                lexer = get_lexer_by_name('text', stripnl=True, encoding='UTF-8')
        empty_code_blocks[index].replaceWith(
                highlight(block.renderContents(), lexer, formatter))
        index = index + 1
    # This is ugly, but it's the easiest way to kill extraneous paragraphs Markdown inserts.
    return str(soup).replace('<p><div', '<div').replace('</div>\n\n</p>', '</div>\n\n')

formatter.register('typygmentdown', typygmentdown)

More like this

  1. Template tag - list punctuation for a list of items by shapiromatron 10 months, 2 weeks ago
  2. JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 10 months, 2 weeks ago
  3. Serializer factory with Django Rest Framework by julio 1 year, 5 months ago
  4. Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 6 months ago
  5. Help text hyperlinks by sa2812 1 year, 6 months ago

Comments

ubernostrum (on August 22, 2007):

Hm.

Looks like the Pygments-highlighted div ends up inside a Markdown-generated p. Going to have to Soup that.

#

ubernostrum (on August 22, 2007):

OK, janky hack in place to remove extraneous p around div. I'll try to Soup this properly later.

#

panosl (on January 19, 2010):

On line 64, you might wanna chage str(soup) to unicode(soup), else if the content you're passing contains characters outside of ASCII, it throws an exception (ordinal not in range).

#

Please login first before commenting.