Template tag for importing content from external url

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
@register.simple_tag
def geturl(url, timeout=None):
    """
    Usage: {% geturl url [timeout] %}

    Examples:
    {% geturl "http://example.com/path/to/content/" %}
    {% geturl object.urlfield 1 %} 
    """
    import socket
    from urllib2 import urlopen
    socket_default_timeout = socket.getdefaulttimeout()
    if timeout is not None:
        try:
            socket_timeout = float(timeout)
        except ValueError:
            raise template.TemplateSyntaxError, "timeout argument of geturl tag, if provided, must be convertible to a float"
        try:
            socket.setdefaulttimeout(socket_timeout)
        except ValueError:
            raise template.TemplateSyntaxError, "timeout argument of geturl tag, if provided, cannot be less than zero"
    try:
        try: 
            content = urlopen(url).read()
        finally: # reset socket timeout
            if timeout is not None:
                socket.setdefaulttimeout(socket_default_timeout) 
    except:
        content = ''        
    return content

More like this

  1. Soft Timeout Middleware by adamlofts 4 years, 10 months ago
  2. Effective content caching for mass-load site using redirect feature by nnseva 2 years, 9 months ago
  3. format content and make url to tinyurl by huacnlee 5 years, 8 months ago
  4. cache_smart template tag by michiel_1981 6 years, 1 month ago
  5. Default Template Loading by nirvdrum 6 years, 8 months ago

Comments

kioopi (on April 24, 2008):

This screams xss-vulnerability. Do you have control over the urls you include?

#

dchandek (on April 24, 2008):

The intention is similar to the [HTML_REMOVED] tag of the JavaServer Pages Standard Tag Library (JSTL). It's up to the developer to use trusted sources. :)

#

dchandek (on April 24, 2008):

Updated to use template.Variable class from Django dev version. Also switched to urllib2 b/c urllib2.urlopen raises urllib2.HTTPError on 404, whereas urllib.urlopen simply returns 404 page content (bad).

#

dchandek (on April 24, 2008):

Updated to use the simple_tag shortcut. :)

#

cmheisel (on April 24, 2008):

If you use this in production there are a few things to be wary of (we've done something similar for trusted URLs at my job).

  • You'll want to set a timetout so slow servers don't hold up your page
  • You'll want some (serious) caching

From what I can tell, the way to set a timeout is: import socket; socket.setdefaulttimeout(seconds)

#

dchandek (on April 24, 2008):

Good point on the timeout issue. Caching, of course, can be handled using the {% cache %} template tag.

#

dchandek (on April 24, 2008):

Updated to deal with timeout issue. I have a minimal understanding of sockets and threads, so I played it safe by getting current default socket timeout at the outset and resetting the default at the end.

#

(Forgotten your password?)