This filter converts HTML to nicely-formatted text using the text-browser W3M. I use this for constructing e-mail bodies, since it means I don't have to have two templates, one HTML and one plain-text, for each detailed e-mail I want to send. Besides the obvious maintenance benefits, this is nice because Django's templating system isn't well-suited to plain-text where whitespace and line-breaks are significant.
I chose W3M because it renders tables nicely and can take in HTML from STDIN (which Lynx can't do). An alternative is ELinks; to use it, change "cmd" to the following: elinks -force-html -stdin -dump -no-home
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | from subprocess import Popen, PIPE
from django import template
register = template.Library()
@register.filter
def html2text(value):
"""
Pipes given HTML string into the text browser W3M, which renders it.
Rendered text is grabbed from STDOUT and returned.
"""
try:
cmd = "w3m -dump -T text/html -O ascii"
proc = Popen(cmd, shell = True, stdin = PIPE, stdout = PIPE)
return proc.communicate(str(value))[0]
except OSError:
# something bad happened, so just return the input
return value
if __name__ == "__main__":
from urllib import urlopen
print html2text(urlopen("http://www.w3.org/TR/REC-html40/").read())
|
More like this
- Template tag - list punctuation for a list of items by shapiromatron 10 months, 2 weeks ago
- JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 10 months, 3 weeks ago
- Serializer factory with Django Rest Framework by julio 1 year, 5 months ago
- Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 6 months ago
- Help text hyperlinks by sa2812 1 year, 6 months ago
Comments
cmd = "lynx -force_html -stdin -dump"
works for me.#
What's wrong with
#
Right you are, but I don't see any reason to use Lynx. Both W3M and ELinks render HTML much better than Lynx.
The etree.tostring() method is just a serialization method. It is not an HTML rendering engine, which is what W3M and ELinks provide.
That script would be appropriate for very simple HTML, but ~400 lines of Python can't replace a complete rendering engine.
#
Great idea. I'm trying this and am getting:
After a little debugging, I haven't found a solution. Any ideas?
#
ProWeb365 is a results & relationship-driven company. We only bill you for results that we successfully deliver. Our Minneapolis web design firm provides the most reliable and professional web services, and strives to go above and beyond your expectations. We can help place your business website on the most important real estate on the Internet: Google’s page-one. Contact our Minneapolis internet marketing company at 612-590-8080 and let us prepare your website for Online marketing success.
#
Please login first before commenting.