- Author:
- Mirrorball
- Posted:
- August 1, 2007
- Language:
- Python
- Version:
- .96
- Score:
- 0 (after 0 ratings)
I just converted the autop filter from Drupal (which is itself based on a Wordpress filter) from PHP to Python. I had to change the format of the regular expressions a bit and make them raw strings, but otherwise the function is unchanged. It should work exactly like the original function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | import re
def autop(text):
'''
Convert line breaks into <p> and <br> in an intelligent fashion.
Adapted from Drupal.
'''
# All block level tags
block = '(?:table|thead|tfoot|caption|colgroup|tbody|tr|td|th|div|dl|dd|dt|ul|ol|li|pre|select|form|blockquote|address|p|h[1-6])'
# Split at <pre>, <script>, <style> and </pre>, </script>, </style> tags.
# We don't apply any processing to the contents of these tags to avoid messing
# up code. We look for matched pairs and allow basic nesting. For example:
# "processed <pre> ignored <script> ignored </script> ignored </pre> processed"
chunks = re.split(r'(?i)(</?(?:pre|script|style)[^>]*>)', text)
# Note: PHP ensures the array consists of alternating delimiters and literals
# and begins and ends with a literal (inserting NULL as required).
# Also true for Python, which will insert empty strings as required.
ignore = False
ignoretag = ''
output = ''
for i, chunk in enumerate(chunks):
if i % 2:
# Opening or closing tag?
open = (chunk[1] != '/')
tag = chunk[2 - open:].split('[ >]', 2)
if not ignore:
if open:
ignore = True
ignoretag = tag
# Only allow a matching tag to close it.
elif not open and ignoretag == tag:
ignore = False
ignoretag = ''
elif not ignore:
chunk = re.sub(r'\n*$', '', chunk) + "\n\n" # just to make things a little easier, pad the end
chunk = re.sub(r'<br />\s*<br />', r"\n\n", chunk)
chunk = re.sub(r'(<' + block + '[^>]*>)', r"\n\1", chunk) # Space things out a little
chunk = re.sub(r'(</' + block + '>)', r"\1\n\n", chunk) # Space things out a little
chunk = re.sub(r"\n\n+", r"\n\n", chunk) # take care of duplicates
chunk = re.sub(r'(?s)\n?(.+?)(?:\n\s*\n|\Z)', r"<p>\1</p>\n", chunk) # make paragraphs, including one at the end
chunk = re.sub(r'<p>\s*</p>\n', r'', chunk) # under certain strange conditions it could create a P of entirely whitespace
chunk = re.sub(r"<p>(<li.+?)</p>", r"\1", chunk) # problem with nested lists
chunk = re.sub(r'(?i)<p><blockquote([^>]*)>', r"<blockquote\1><p>", chunk)
chunk = chunk.replace('</blockquote></p>', r'</p></blockquote>')
chunk = re.sub(r'<p>\s*(</?' + block + '[^>]*>)', r"\1", chunk)
chunk = re.sub(r'(</?' + block + '[^>]*>)\s*</p>', r"\1", chunk)
chunk = re.sub(r'(?<!<br />)\s*\n', r"<br />\n", chunk) # make line breaks
chunk = re.sub(r'(</?' + block + '[^>]*>)\s*<br />', r"\1", chunk)
chunk = re.sub(r'<br />(\s*</?(?:p|li|div|th|pre|td|ul|ol)>)', r'\1', chunk)
chunk = re.sub(r'&([^#])(?![A-Za-z0-9]{1,8};)', r'&\1', chunk)
output += chunk
return output
|
More like this
- Template tag - list punctuation for a list of items by shapiromatron 10 months, 3 weeks ago
- JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 11 months ago
- Serializer factory with Django Rest Framework by julio 1 year, 5 months ago
- Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 6 months ago
- Help text hyperlinks by sa2812 1 year, 7 months ago
Comments
Is there any reason I might be missing not to write the loop as:
...?
#
No, I just didn't know about the enumerate function, thanks.
#
Please login first before commenting.