Login

Soft-wrap long lines

Author:
Ubercore
Posted:
April 3, 2008
Language:
Python
Version:
.96
Tags:
filter line-break wbr softwrap
Score:
0 (after 0 ratings)

This filter naively parses HTML content, and inserts <wbr/> tags in lines with unbroken strings longer than max_line_length characters. It leaves content inside tags alone, so that things like urls are unaltered. XHTML entities are treated as atomic, and whitespace is determined with a regex.

It assumes well formed HTML.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def softwraphtml(value, max_line_length=20):
    import re
    whitespace_re = re.compile('\s')
    new_value = []
    unbroken_chars = 0
    in_tag = False
    in_xhtml_entity = False
    for idx, char in enumerate(value):
        if char == '<':
            in_tag = True
        elif char == '>':
            in_tag = False
            unbroken_chars = 0
        elif char == '&' and not in_tag:
            in_xhtml_entity = True
        elif char == ';' and in_xhtml_entity:
            in_xhtml_entity = False            
        elif whitespace_re.match(char):
            unbroken_chars = 0
        
        new_value.append(char)
        if not in_xhtml_entity:
            if unbroken_chars >= max_line_length-1 and not in_tag:
                new_value.append("<wbr/>")
                unbroken_chars = 0
            else:
                unbroken_chars += 1
    return ''.join(new_value)

More like this

  1. Auto HTML Linebreak filter by punteney 7 years ago
  2. Django filter stack to cleanup WYSIWYG output by jbergantine 3 years, 7 months ago
  3. make an unordered html list by techiegurl 6 years, 11 months ago
  4. Template tag for stripping blank lines by akaihola 7 years, 2 months ago
  5. Analogue template filter to removetags that also removes the content of the tag by piquadrat 4 years, 5 months ago

Comments

svetlyak (on April 4, 2008):

How about inserting of the [HTML_REMOVED] in the xhtml entities? This may be a big issue if trying to output a correct xhtml code.

For example, any user can place a long URL containing [HTML_REMOVED] in the comments.

#

Ubercore (on April 6, 2008):

That's a very good point. As I said, it's very basic still, but you're right -- it should treat xhtml entities atomically. Thanks for pointing that out.

#

Please login first before commenting.