XhtmlDegraderMiddleware

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
"""
XHTML Degrader Middleware.

When sending contents with the XHTML media type, application/xhtml+xml, this
module checks to ensure that the user agent (web browser) is capable of
rendering it.  If not, it changes the media type to text/html and makes the
contents more "HTML-friendly" (as per the XHTML 1.0 HTML Compatibility
Guidelines).

To use this middleware you need to add a reference to it in your settings.py
file, e.g.:

    MIDDLEWARE_CLASSES = (
        ...
        'YOURPATH.xhtmldegrader.middleware.XhtmlDegraderMiddleware',
    )

(If you use GZipMiddleware, you should ensure that it appears in the list
before XhtmlDegraderMiddleware, to allow the XHTML Degrader to act first.)
"""

import re

_MEDIA_TYPE_RE =  re.compile(r'application\/xhtml\+xml')

_EMPTY_TAG_END_RE =  re.compile(r'(?<=\S)\/\>')

_PROCESSING_INSTRUCTION_RE = re.compile(r'\<\?.*\?\>')

def _supports_xhtml(request):
    """Examines an HTTP request header to determine whether the user agent
    supports the XHTML media type (application/xhtml+xml).  Returns True or
    False."""
    if '/xhtml+xml' in request.META.get('HTTP_ACCEPT', '').lower():
        # User agent claims to support the XHTML media type.
        return True
    else:
        # No reference to XHTML support.
        return False

class XhtmlDegraderMiddleware(object):
    """Django middleware that "degrades" any contents sent as XHTML if the
    requesting browser doesn't support the XHTML media type.  Degrading involves
    switching the content type to text/html, removing XML processing
    instructions, etc.

    If the HTTP response is already set to text/html, or set to any media type
    other than application/xhtml+xml, this middleware will have no effect.
    """

    def process_response(self, request, response):
        # Check if this is XHTML, and check if the user agent supports XHTML.
        if response['Content-Type'].split(';')[0] != 'application/xhtml+xml' \
                or _supports_xhtml(request):
            # The content is fine, simply return it.
            return response
        # If the response has already been compressed we can't modify it
        # further, so just return it.  (N.B. if you use GZipMiddleware, you
        # should ensure that it's listed in MIDDLEWARE_CLASSES before
        # XhtmlDegraderMiddleware, to allow the XHTML Degrader to act first.)
        if response.has_header('Content-Encoding'):
            # Already compressed, so we can't do anything useful with it.
            return response
        # The content is XHTML, and the user agent doesn't support it.
        # Fix the media type:
        response['Content-Type'] = _MEDIA_TYPE_RE.sub('text/html',
                response['Content-Type'], 1)
        if 'charset' not in response['Content-Type']:
            response['Content-Type'] += '; charset=utf-8'
        # Modify the response contents as required:
        # Remove any XML processing instructions:
        response.content = _PROCESSING_INSTRUCTION_RE.sub('',
                response.content)
        # Ensure there's a space before the trailing '/>' of empty elements:
        response.content = _EMPTY_TAG_END_RE.sub(' />',
                response.content)
        # Lose any excess whitespace:
        response.content = response.content.strip()
        if not response.content.startswith('<!DOCTYPE'):
            # Add a DOCTYPE, so that the user agent isn't in "quirks" mode.
            response.content = '<!DOCTYPE html>\n' + response.content
        return response

More like this

  1. XhtmlMortifierMiddleware by jgelens 6 years ago
  2. Improved Accept middleware with webkit workaround by raven_nevermore 3 years, 5 months ago
  3. P3P Headers for iframes by jeverling 5 years, 10 months ago
  4. Caching XHTML render_to_response by smoonen 5 years, 9 months ago
  5. SQLLoggerMidleware + infobar by robvdl 6 years, 3 months ago

Comments

Pistahh (on August 1, 2007):

instead of

response.content[0:9] != '<!DOCTYPE':

I would suggest

not response.content.startswith('<!DOCTYPE')

This is more speed & memory efficient.

#

robvdl (on September 13, 2007):

Great snippet, thanks.

One small suggestion, maybe you could also replace: meta http-equiv="Content-type" content="application/xhtml+xml ... etc, in the head section. To do so, I added:

at the beginning:

_MEDIA_TYPE_HEAD_RE = re.compile(r'="application\/xhtml\+xml')

and around line 70 add:

response.content = _MEDIA_TYPE_HEAD_RE.sub('="text/html', response.content, 1)

#

robvdl (on January 12, 2008):

I've made another little adjustment. I noticed that django-admin has several problems when served as application/xhtml+xml, so I adjusted this middleware so that anything served fro mthe /admin url is automatically degrade d too:

def process_response(self, request, response):

if not request.META['PATH_INFO'].startswith('/admin/'):

  # Check if this is XHTML, and check if the user agent supports XHTML.

  if response['Content-Type'].split(';')[0] != 'application/xhtml+xml' or _supports_xhtml(request):

    # The content is fine, simply return it.

    return response

#

robvdl (on July 6, 2008):

Looks like there is a bit too much escaping going on in the regular expressions:

_MEDIA_TYPE_RE = re.compile(r'application\/xhtml\+xml')

Can simply be:

_MEDIA_TYPE_RE = re.compile(r'application/xhtml\+xml')

and

_PROCESSING_INSTRUCTION_RE = re.compile(r'\<\?.*\?\>')

Can simple be:

_PROCESSING_INSTRUCTION_RE = re.compile(r'<\?.+\?>')

The following regex didn't make much sense at all, I don't know how the = fit in there, aswell as some of the other parts, here is the original:

_EMPTY_TAG_END_RE = re.compile(r'(?<=\S)\/\>')

Anyway, I rewrote it and this one works:

_EMPTY_TAG_END_RE = re.compile(r'(<.+)/>')

Also replace line 75 with this, or it wont work:

response.content = _EMPTY_TAG_END_RE.sub(r'\1>', response.content)

What this does is replace:

\<input type="button" /> with \<input type="button" > and \<input type="button"/> with \<input type="button">

I cannot get rid of the space at the end in the first example, not without adding extra python code anyway, and since the extra space does nothing bad, I just left it at that.

#

robvdl (on July 6, 2008):

Sorry, I meant a * not a +:

_PROCESSING_INSTRUCTION_RE = re.compile(r'<\?.*\?>')

#

(Forgotten your password?)