Development middleware to ensure that responses validate as HTML.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | # encoding: utf-8
#
# Copyright (c) 2009 Thomas Kongevold Adamcik
#
# Snippet is released under the MIT License. So feel free to use it in other
# projects as long as the notice remains intact :)
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
# See http://www.djangosnippets.org/snippets/1312/
'''
HTML Validation Middleware
==========================
Simple development middleware to ensure that responses validate as HTML.
Dependencies:
-------------
- tidy (http://utidylib.berlios.de/)
Installation:
-------------
Assuming this file has been place in your PYTHON_PATH (e.g.
djangovalidation/middleware.py), simply add the following
to your middleware settings:
'djangovalidation.middleware.HTMLValidationMiddleware',
Remember that the order of your middleware settings does matter, this
middleware should be placed before eg. GzipMiddleware, djangologging and
any other middlewares that modify the response's content.
Operation:
----------
Validation only kicks in under to following conditions:
- DEBUG == True
- HTML_VALIDATION_ENABLE == True (default)
- REMOTE_ADDR in INTERNAL_IPS
- 'html' in Content-Type
- 'disable-validation' not in GET
- request.is_ajax() == False
- type(response) == HttpResponse
- request.path doesn't match HTML_VALIDATION_URL_IGNORE
To bypass the check any uri can be appended with ?disable-validation
Settings:
---------
- HTML_VALIDATION_ENABLE - Turns middleware on/off. Default: True
- HTML_VALIDATION_ENCODING - Default: 'utf-8'
- HTML_VALIDATION_DOCTYPE - Default: 'strict'
- HTML_VALIDATION_IGNORE - Default: ['trimming empty <option>',
'<table> lacks "summary" attribute']
- HTML_VALIDATION_URL_IGNORE - List of regular expressions to check
request.path against when deciding if we should
process the request. Default: [],
- HTML_VALIDATION_XHTML - Default: True
- HTML_VALIDATION_OPTIONS - Options that get passed to tidy, overrides
previous settings. Default: based on above
settings
For more information about settings use the source and consult tidy's
documentation.
History
-------
December 19, 2009:
- Fix empty HTML_VALIDATION_URL_IGNORE. Thanks .iqqmuT
July 12, 2009:
- Ignore ajax request
- Add HTML_VALIDATION_URL_IGNORE settings
February 6, 2009:
- Initial relase
'''
import re
import tidy
from django.conf import settings
from django.core.exceptions import MiddlewareNotUsed
from django.http import HttpResponse, HttpResponseServerError
from django.template import Context, Template
class HTMLValidationMiddleware(object):
'''
Checks that the response is valid HTML with proper Unicode. In the
event of a failed check we show an simple page listing the HTML source
and which errors need to be fixed.
'''
# Validation errors to ignore. Can be overridden with VALIDATION_IGNORE setting
ignore = [
'trimming empty <option>',
'<table> lacks "summary" attribute',
]
# Options for tidy. Can be overridden with HTML_VALIDATION_OPTIONS setting
options = {
'doctype': getattr(settings, 'HTML_VALIDATION_DOCTYPE', 'strict'),
'output_xhtml': getattr(settings, 'HTML_VALIDATION_XHTML', True),
'input_encoding': getattr(settings, 'HTML_VALIDATION_ENCODING', 'utf8'),
}
def __init__(self):
if not settings.DEBUG or not getattr(settings, 'HTML_VALIDATION_ENABLE', True):
raise MiddlewareNotUsed
self.options = getattr(settings, 'HTML_VALIDATION_OPTIONS', self.options)
self.ignore = set(getattr(settings, 'HTML_VALIDATION_IGNORE', self.ignore))
self.ignore_regexp = self._build_ignore_regexp(getattr(settings, 'HTML_VALIDATION_URL_IGNORE', []))
self.template = Template(self.HTML_VALIDATION_TEMPLATE.strip())
def process_response(self, request, response):
if not self._should_validate(request, response):
return response
errors = self._validate(response)
if not errors:
return response
context = self._get_context(response, errors)
return HttpResponseServerError(self.template.render(context))
def _build_ignore_regexp(self, urls):
if not urls:
return None
urls = [r'(%s)' % url for url in urls]
return re.compile(r'(%s)' % r'|'.join(urls))
def _should_validate(self, request, response):
return ('html' in response['Content-Type'] and
'disable-validation' not in request.GET and
not request.is_ajax() and
(not self.ignore_regexp or
not self.ignore_regexp.search(request.path)) and
request.META['REMOTE_ADDR'] in settings.INTERNAL_IPS and
type(response) == HttpResponse)
def _validate(self, response):
errors = tidy.parseString(response.content, **self.options).errors
return self._filter_errors(errors)
def _filter_errors(self, errors):
return filter(lambda e: e.message not in self.ignore, errors)
def _get_context(self, response, errors):
lines = []
error_dict = dict(map(lambda e: (e.line, e.message), errors))
for i, line in enumerate(response.content.split('\n')):
lines.append((line, error_dict.get(i + 1, False)))
return Context({'errors': errors,
'lines': lines,})
HTML_VALIDATION_TEMPLATE = """
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>HTML validation error at {{ request.path_info|escape }}</title>
<meta name="robots" content="NONE,NOARCHIVE">
<style type="text/css">
html * { padding: 0; margin: 0; }
body * { padding: 10px 20px; }
body * * { padding: 0; }
body { font: small sans-serif; background: #eee; }
body>div { border-bottom: 1px solid #ddd; }
h1 { font-weight: normal; margin-bottom: 0.4em; }
table { border: none; border-collapse: collapse; width: 100%; }
td, th { vertical-align: top; padding: 2px 3px; }
th { width: 6em; text-align: right; color: #666; padding-right: 0.5em; }
#info { background: #f6f6f6; }
#info th { width: 3em; }
#summary { background: #ffc; }
#explanation { background: #eee; border-bottom: 0px none; }
.meta { margin: 1em 0; }
.error { background: #FEE }
</style>
</head>
<body>
<div id="summary">
<h1>HTML validation error</h1>
<p>
Your HTML did not validate. If this page contains user content that
might be the problem. Please fix the following:
</p>
<table class="meta">
{% for error in errors %}
<tr>
<th>Line: <a href="#line{{ error.line }}">{{ error.line }}</a></th>
<td>{{ error.message|escape }}</td>
</tr>
{% endfor %}
</table>
<p>
If you want to bypass this warning, click <a href="?disable-validation">
here</a>. Please note that this warning will persist until you fix the
problems mentioned above.
</p>
</div>
<div id="info">
<table>
{% for line,error in lines %}
<tr{% if error %} class="error"{% endif %}>
<th id="line{{ forloop.counter }}">
{{ forloop.counter|stringformat:"03d" }}
</th>
<td{% if error %} title="{{ error }}"{% endif %}>
<pre>{{ line }}</pre>
</td>
</tr>
{% endfor %}
</table>
</div>
<div id="explanation">
<p>
You're seeing this error because you have not set
<code>HTML_VALIDATION_ENABLE = False</code> in your Django settings file.
Change that to <code>False</code>, and Django will stop validating your
HTML.
</p>
</div>
</body>
</html>"""
|
More like this
- Template tag - list punctuation for a list of items by shapiromatron 10 months, 2 weeks ago
- JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 10 months, 3 weeks ago
- Serializer factory with Django Rest Framework by julio 1 year, 5 months ago
- Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 6 months ago
- Help text hyperlinks by sa2812 1 year, 6 months ago
Comments
Really excellent and well thought out. Thanks!
This could benefit from an additional setting to ignore some URL patterns (since the Admin doesn't validate :) I added the following to the
_should_validate()
method, but this can probably be improved to be less convoluted:#
Good to see that there are alternatives out there :)
Personally I prefer a KISS approach to these types of utilities. Not having to think about keeping state and adding development stuff to my URLconf is a big plus.
This of course comes at the cost of having to be in the developers face all the time, potentially disturbing their work flow (then again forcing the habit of writing valid HTML is a good thing).
It all comes down to personal preference which you prefer, given that both approaches have their merit :)
#
Thanks for a really nice snippet!
There is one minor bug, though: if HTML_VALIDATION_URL_IGNORE is empty, as it is by default, then validation will never happen. Here is a little fix suggestion for that:
#
Finally got around to updating the actual snippet with the provided patch :-)
#
It doesn't work with HTML5 right now. It isn't the fault of this snippet, but the fault of libtidy. Once libtidy is updated this snippet will useful again.
#
There is little bug in this snippet: template expects "request" variable which is not passed in Context object.
#
Please login first before commenting.