1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | def softwraphtml(value, max_line_length=20):
import re
whitespace_re = re.compile('\s')
new_value = []
unbroken_chars = 0
in_tag = False
in_xhtml_entity = False
for idx, char in enumerate(value):
if char == '<':
in_tag = True
elif char == '>':
in_tag = False
unbroken_chars = 0
elif char == '&' and not in_tag:
in_xhtml_entity = True
elif char == ';' and in_xhtml_entity:
in_xhtml_entity = False
elif whitespace_re.match(char):
unbroken_chars = 0
new_value.append(char)
if not in_xhtml_entity:
if unbroken_chars >= max_line_length-1 and not in_tag:
new_value.append("<wbr/>")
unbroken_chars = 0
else:
unbroken_chars += 1
return ''.join(new_value)
|
More like this
- Auto HTML Linebreak filter by punteney 5 years, 1 month ago
- Updated version of StripWhitespaceMiddleware (v1.1) by sleepycal 1 year, 11 months ago
- Django filter stack to cleanup WYSIWYG output by jbergantine 1 year, 8 months ago
- make an unordered html list by techiegurl 4 years, 12 months ago
- HTML to text filter by MasonM 4 years, 9 months ago
Comments
How about inserting of the [HTML_REMOVED] in the xhtml entities? This may be a big issue if trying to output a correct xhtml code.
For example, any user can place a long URL containing [HTML_REMOVED] in the comments.
#
That's a very good point. As I said, it's very basic still, but you're right -- it should treat xhtml entities atomically. Thanks for pointing that out.
#