Character encoding fix

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from django.db.models import signals
from django.dispatch import dispatcher
from django.db import models
def kill_gremlins(text): 
    return unicode(text).encode('iso-8859-1').decode('cp1252')
def charstrip(sender, instance):
	for i_attr in instance._meta.fields:
		if type(i_attr) == models.TextField or type(i_attr) == models.CharField:
			if getattr(instance, i_attr.name):
				setattr(instance, i_attr.name, kill_gremlins(getattr(instance, i_attr.name)))

dispatcher.connect(charstrip, signal=signals.pre_save)

More like this

  1. Improved Pickled Object Field by taavi223 4 years, 8 months ago
  2. Custom model field to store dict object in database by rudyryk 4 years ago
  3. unique validation for ModelForm by whiskybar 6 years, 1 month ago
  4. Validating Model subclass by slacy 2 years, 2 months ago
  5. Run model validation before saving a model instance by buriy 3 years, 3 months ago

Comments

gabor (on April 25, 2008):

very nice/clean approach with the signals,

but the kill_gremlins function seems to be a little over-complex to me.

i mean, cannot we achieve the same with:

def kill_gremlins(text): 
    return text.encode('iso-8859-1').decode('cp1252')

?

(assuming that we are dealing with mishandled unicode-strings.

#

mrtron (on April 26, 2008):

Yes, that does appear to work correctly. I thought that route would drop the non iso compatible characters, but it appears to be correctly making the conversion. Very nice, I will update the method.

#

effbot (on July 16, 2008):

Note that encode("iso-8859-1") does not handle non-latin-1 characters in a Unicode string (obviously):

>>> s = u"\u1234" # random unicode character
>>> unicodedata.name(s)
'ETHIOPIC SYLLABLE SEE'
>>> s.encode("iso-8859-1").decode("cp1252")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u1234' in position
0: ordinal not in range(256)
>>>

Maybe the usecase for this snippet is more limited, but it's not a full replacement for my (rather dated) code.

#

(Forgotten your password?)