Login

[middleware] Rewrite anchors to point into Coral CDN

Author:
crime_minister
Posted:
July 29, 2008
Language:
Python
Version:
.96
Score:
1 (after 1 ratings)

This simple middleware rewrites the 'href' attribute of any <a> tags in your response content. The URL href is modified by appending the string '.nyud.net', which causes the Coral Content Distribution Network to retrieve a copy of the page and cache it before returning it to the user agent.

This might be useful if you're writing another Slashdot and you want to avoid turning the servers you link to into smoking craters.

You should be able to apply this functionality to a single view as well (though I haven't tried this yet).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import re
from urlparse import urlsplit, urlunsplit

_coral_suffix = '.nyud.net'

_regex = '(?P<prefix><a.*href=")(?P<url>.*)(?P<suffix>".*>)'
_anchor_regex = re.compile( _regex )


class CoralCDNMiddleware(object):
    """
    This middleware rewrites anchor tags contained in the response
    content so that the pages are fetched through the Coral Content
    Distribution Network [http://coralcdn.org/].
    """
    def process_response(self, request, response):
        # Function called by re.sub() to compute the replacement value
        # for any matches it finds.
        def a_replacer( match ):
            # The URL is captured by a named group in the regex.
            url = match.group( 'url' )
            parts = urlsplit( url )
            # Append the Coral CDN suffix to the 'netloc' URL part,
            # assuming it's there. If not, we're looking at local
            # reference so no need to rewrite the URL.
            if parts.netloc:
                # Append the suffix before any port number.
                netloc_parts = parts.netloc.split( ':' )
                netloc_parts[0] += _coral_suffix

                # Replace the 'netloc' part of the urlsplit() result
                # tuple.
                parts = list( parts )
                parts[1] = ':'.join( netloc_parts )

                # Replace the named group 'url' in the match with the
                # new URL.
                prefix = match.group( 'prefix' )
                suffix = match.group( 'suffix' )
                anchor = prefix + urlunsplit( parts ) + suffix
            else:
                anchor = match.group()
                
            return anchor

        # Find all anchor tags in the response content and rewrite
        # them.
        response.content = _anchor_regex.sub( a_replacer, response.content )
        return response

More like this

  1. Template tag - list punctuation for a list of items by shapiromatron 2 months ago
  2. JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 2 months, 1 week ago
  3. Serializer factory with Django Rest Framework by julio 9 months, 1 week ago
  4. Image compression before saving the new model / work with JPG, PNG by Schleidens 9 months, 4 weeks ago
  5. Help text hyperlinks by sa2812 10 months, 3 weeks ago

Comments

keeper (on July 1, 2012):

Looks like this middleware not checking if it's Coral itself trying to access some page. Does Coral removes '.nyud.net' from such urls to access a real file which it needs to cache?

#

Please login first before commenting.