[middleware] Rewrite anchors to point into Coral CDN

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import re
from urlparse import urlsplit, urlunsplit

_coral_suffix = '.nyud.net'

_regex = '(?P<prefix><a.*href=")(?P<url>.*)(?P<suffix>".*>)'
_anchor_regex = re.compile( _regex )


class CoralCDNMiddleware(object):
    """
    This middleware rewrites anchor tags contained in the response
    content so that the pages are fetched through the Coral Content
    Distribution Network [http://coralcdn.org/].
    """
    def process_response(self, request, response):
        # Function called by re.sub() to compute the replacement value
        # for any matches it finds.
        def a_replacer( match ):
            # The URL is captured by a named group in the regex.
            url = match.group( 'url' )
            parts = urlsplit( url )
            # Append the Coral CDN suffix to the 'netloc' URL part,
            # assuming it's there. If not, we're looking at local
            # reference so no need to rewrite the URL.
            if parts.netloc:
                # Append the suffix before any port number.
                netloc_parts = parts.netloc.split( ':' )
                netloc_parts[0] += _coral_suffix

                # Replace the 'netloc' part of the urlsplit() result
                # tuple.
                parts = list( parts )
                parts[1] = ':'.join( netloc_parts )

                # Replace the named group 'url' in the match with the
                # new URL.
                prefix = match.group( 'prefix' )
                suffix = match.group( 'suffix' )
                anchor = prefix + urlunsplit( parts ) + suffix
            else:
                anchor = match.group()
                
            return anchor

        # Find all anchor tags in the response content and rewrite
        # them.
        response.content = _anchor_regex.sub( a_replacer, response.content )
        return response

More like this

  1. Template tag to sort a list of links by pytechd 6 years, 8 months ago
  2. SSL / HTTPS Middleware for Redirection and href Rewriting by DrMeers 3 years, 11 months ago
  3. Encode emails as URIs by fahhem 3 years, 8 months ago
  4. Fix duplicate first page of paginated results by muhuk 4 years, 7 months ago
  5. XhtmlDegraderMiddleware by dmh 6 years, 8 months ago

Comments

keeper (on July 1, 2012):

Looks like this middleware not checking if it's Coral itself trying to access some page. Does Coral removes '.nyud.net' from such urls to access a real file which it needs to cache?

#

(Forgotten your password?)