Login

Strip trailing .html extensions from URLs so that existing bookmarks work for a legacy site ported to Django

Author:
daverowell
Posted:
May 15, 2007
Language:
Python
Version:
.96
Tags:
middleware url_rewrite strip_dot_html_from_urls
Score:
2 (after 2 ratings)

1) You've ported an existing web site to Django.

2) The new site URLs don't have html (or htm) extensions.

3) The old site had URLs with html extensions.

4) You want existing bookmarks to work.

Use this middleware to removes trailing .htm and .html extensions from incoming URLs (GETs only) so that your new site honors existing bookmarks. Locate it in settings.MIDDLEWARE_CLASSES near CommonMiddleware because it has similar middleware stack location requirements. If an incoming URL has an html extension, ZapDotHtmlMiddleware strips it out and redirects.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from django import http
import sre

class ZapDotHtmlMiddleware(object):
    """Removes trailing .htm and .html extensions from incoming URLs
    (GETs only) so that a legacy site ported to Django can continue to
    support existing bookmarks. Locate in settings.MIDDLEWARE_CLASSES near
    CommonMiddleware (similar middleware stack location requirements)."""
    # Dave Rowell, Appropriate Solutions, Inc., www.appropriatesolutions.com

    def __init__(self):
        # RE match for .htm or .html at the end of the url, possibly
        # followed by /, but not including it. Compile once, use many.
        self.re_trim_html = sre.compile(r'\.html?(?=/?$)', sre.IGNORECASE)

    def process_request(self, request):
        """ Rewrite incoming URL if it contains an htm or html extension."""
        if request.method == 'GET':
            #Excise any .html ending.
            new_path = self.re_trim_html.sub('', request.path)
            if new_path != request.path:
                # URL was trimmed; redirect.
                # (Borrowed from django.middleware.common.CommonMiddleware.)
                host = http.get_host(request)
                if host:
                    newurl = "%s://%s%s" % (request.is_secure() and 'https' or 'http', host, new_path)
                else:
                    newurl = newpath
                urlencode = request.GET.urlencode()
                if len(urlencode):
                    newurl += '?' + urlencode
                return http.HttpResponsePermanentRedirect(newurl)

        return None

More like this

  1. Strip html comments middleware by trbs 8 years, 1 month ago
  2. add port from settings file to an url by sebnapi 2 years, 10 months ago
  3. Effective content caching for mass-load site using redirect feature by nnseva 3 years, 9 months ago
  4. Compact idiom for legacy URLs in URLconfs by pbx 8 years ago
  5. keeptags: strip all HTML tags from output except a specified list of elements by chrominance 7 years, 10 months ago

Comments

Please login first before commenting.