Login

[middleware] Rewrite anchors to point into Coral CDN

Author:
crime_minister
Posted:
July 29, 2008
Language:
Python
Version:
.96
Tags:
middleware coral cdn anchor-rewrite
Score:
1 (after 1 ratings)

This simple middleware rewrites the 'href' attribute of any <a> tags in your response content. The URL href is modified by appending the string '.nyud.net', which causes the Coral Content Distribution Network to retrieve a copy of the page and cache it before returning it to the user agent.

This might be useful if you're writing another Slashdot and you want to avoid turning the servers you link to into smoking craters.

You should be able to apply this functionality to a single view as well (though I haven't tried this yet).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import re
from urlparse import urlsplit, urlunsplit

_coral_suffix = '.nyud.net'

_regex = '(?P<prefix><a.*href=")(?P<url>.*)(?P<suffix>".*>)'
_anchor_regex = re.compile( _regex )


class CoralCDNMiddleware(object):
    """
    This middleware rewrites anchor tags contained in the response
    content so that the pages are fetched through the Coral Content
    Distribution Network [http://coralcdn.org/].
    """
    def process_response(self, request, response):
        # Function called by re.sub() to compute the replacement value
        # for any matches it finds.
        def a_replacer( match ):
            # The URL is captured by a named group in the regex.
            url = match.group( 'url' )
            parts = urlsplit( url )
            # Append the Coral CDN suffix to the 'netloc' URL part,
            # assuming it's there. If not, we're looking at local
            # reference so no need to rewrite the URL.
            if parts.netloc:
                # Append the suffix before any port number.
                netloc_parts = parts.netloc.split( ':' )
                netloc_parts[0] += _coral_suffix

                # Replace the 'netloc' part of the urlsplit() result
                # tuple.
                parts = list( parts )
                parts[1] = ':'.join( netloc_parts )

                # Replace the named group 'url' in the match with the
                # new URL.
                prefix = match.group( 'prefix' )
                suffix = match.group( 'suffix' )
                anchor = prefix + urlunsplit( parts ) + suffix
            else:
                anchor = match.group()
                
            return anchor

        # Find all anchor tags in the response content and rewrite
        # them.
        response.content = _anchor_regex.sub( a_replacer, response.content )
        return response

More like this

  1. Template tag to sort a list of links by pytechd 7 years, 7 months ago
  2. SSL / HTTPS Middleware for Redirection and href Rewriting by DrMeers 4 years, 11 months ago
  3. Encode emails as URIs by fahhem 4 years, 7 months ago
  4. Fix duplicate first page of paginated results by muhuk 5 years, 6 months ago
  5. XhtmlDegraderMiddleware by dmh 7 years, 7 months ago

Comments

keeper (on July 1, 2012):

Looks like this middleware not checking if it's Coral itself trying to access some page. Does Coral removes '.nyud.net' from such urls to access a real file which it needs to cache?

#

Please login first before commenting.