- August 10, 2008
- 0 (after 0 ratings)
This filter converts HTML to nicely-formatted text using the text-browser W3M. I use this for constructing e-mail bodies, since it means I don't have to have two templates, one HTML and one plain-text, for each detailed e-mail I want to send. Besides the obvious maintenance benefits, this is nice because Django's templating system isn't well-suited to plain-text where whitespace and line-breaks are significant.
I chose W3M because it renders tables nicely and can take in HTML from STDIN (which Lynx can't do). An alternative is ELinks; to use it, change "cmd" to the following:
elinks -force-html -stdin -dump -no-home
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
from subprocess import Popen, PIPE from django import template register = template.Library() @register.filter def html2text(value): """ Pipes given HTML string into the text browser W3M, which renders it. Rendered text is grabbed from STDOUT and returned. """ try: cmd = "w3m -dump -T text/html -O ascii" proc = Popen(cmd, shell = True, stdin = PIPE, stdout = PIPE) return proc.communicate(str(value)) except OSError: # something bad happened, so just return the input return value if __name__ == "__main__": from urllib import urlopen print html2text(urlopen("http://www.w3.org/TR/REC-html40/").read())