- April 25, 2008
- python unicode encoding character latin1 django
- 1 (after 1 ratings)
There is a commonly encountered problem with Django and character sets. Windows applications such as Word/Outlook add characters that are not valid ISO-8859-1 and this results in problems saving a Django model to a database with a Latin 1 encoding. These characters should also be converted to avoid any display issues for the end users even if you are using a UTF-8 database encoding. The topic is well covered at Effbot and contains a list of appropriate conversions for each of hte problem characters.
Correcting this for all of your Django models is another issue. Do you handle the re-encoding during the form validation? The save for each model? Create a base class that all your models need to inherit from?
The simplest solution I have created leverages Signals
Combining the re-encoding method suggested at Effbot and the pre_save signal gives you the ability to convert all the problem characters right before the save occurs for any model.
kill_gremlins method replaced with Gabor's suggestion
1 2 3 4 5 6 7 8 9 10 11 12
from django.db.models import signals from django.dispatch import dispatcher from django.db import models def kill_gremlins(text): return unicode(text).encode('iso-8859-1').decode('cp1252') def charstrip(sender, instance): for i_attr in instance._meta.fields: if type(i_attr) == models.TextField or type(i_attr) == models.CharField: if getattr(instance, i_attr.name): setattr(instance, i_attr.name, kill_gremlins(getattr(instance, i_attr.name))) dispatcher.connect(charstrip, signal=signals.pre_save)