Automatically slugify slug fields in your models

Author:: Aliquip
Posted:: March 10, 2007
Language:: Python
Version:: Pre .96
Score:: 3 (after 3 ratings)

Download
Raw

I suppose I'm kind of stubborn, but I prefer to use underscores to replace spaces and other characters. Of course, that shouldn't hold you back from using the build-in slugify filter :)

Forcing the slug to use ASCII equivalents:

Transforming titles like "Äës" to slugs like "aes" was kind of a trial and error job. It now works for me. I hope _string_to_slug(s): proves a rather stable solution. Yet the worst-case scenario is that such characters are lost, I guess that is acceptable.

Other ways of dealing with this problem can be found at Latin1 to ASCII at Activestate or in the comments below.

How to use:

The slug fields in your model must have prepopulate_from set, the fields specified in it are used to build the slug.

To prevent duplicates, a number is added to the slug if the slug already exists for the current field in another, previous, object. I guess there should be a cleaner way to distinguish between creating a new db entry or updating an existing one, sadly, the db back-end is kind of a black-box to me. At least this works ;)

I choose not to alter the slug on an update to keep urls more bookmarkable. You could even extend this further by only updating the slug field if it hasn't been assigned a value.

import re
from django.db import models

class SlugNotCorrectlyPrePopulated(Exception): 
    pass 

def _string_to_slug(s):    
    raw_data = s
    # normalze string as proposed on http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/251871
    # by Aaron Bentley, 2006/01/02
    try:
        import unicodedata        
        raw_data = unicodedata.normalize('NFKD', raw_data.decode('utf-8', 'replace')).encode('ascii', 'ignore')
    except:
        pass
    return re.sub(r'[^a-z0-9-]+', '_', raw_data.lower()).strip('_')
    
# as proposed by Archatas (http://www.djangosnippets.org/users/Archatas/)
def _get_unique_value(model, proposal, field_name="slug", instance_pk=None, separator="-"):
    """ Returns unique string by the proposed one.
    Optionally takes:
    * field name which can  be 'slug', 'username', 'invoice_number', etc.
    * the primary key of the instance to which the string will be assigned.
    * separator which can be '-', '_', ' ', '', etc.
    By default, for proposal 'example' returns strings from the sequence:
        'example', 'example-2', 'example-3', 'example-4', ...
    """
    if instance_pk:
        similar_ones = model.objects.filter(**{field_name + "__startswith": proposal}).exclude(pk=instance_pk).values(field_name)
    else:
        similar_ones = model.objects.filter(**{field_name + "__startswith": proposal}).values(field_name)
    similar_ones = [elem[field_name] for elem in similar_ones]
    if proposal not in similar_ones:
        return proposal
    else:
        numbers = []
        for value in similar_ones:
            match = re.match(r'^%s%s(\d+)$' % (proposal, separator), value)
            if match:
                numbers.append(int(match.group(1)))
        if len(numbers)==0:
            return "%s%s2" % (proposal, separator)
        else:
            largest = sorted(numbers)[-1]
            return "%s%s%d" % (proposal, separator, largest + 1)

def _get_fields_and_data(model):
    opts = model._meta
    slug_fields = []
    for f in opts.fields:
        if isinstance(f, models.SlugField):
            if not f.prepopulate_from:
                raise SlugNotCorrectlyPrePopulated , "Slug for %s is not prepopulated" % f.name
            prepop = []
            for n in f.prepopulate_from:
                if not hasattr(model, n):
                    raise SlugNotCorrectlyPrePopulated , "Slug for %s is to be prepopulated from %s, yet %s.%s does not exist" % (f.name , n , type(model), n)
                else:
                    prepop.append(getattr(model, n))
            slug_fields.append([f , "_".join(prepop)])
    return slug_fields
    
def slugify(sender, instance, signal, *args, **kwargs):    
    for slugs in _get_fields_and_data(instance):    
        original_slug = _string_to_slug(slugs[1])
        slug = original_slug
        ct = 0;
        try:
            # See if object is new
            # To prevent altering urls, don't update slug on existing objects
            sender.objects.get(pk=instance._get_pk_val())
        except:
            slug = _get_unique_value(instance.__class__, slug, slugs[0].name, separator="_")
            setattr(instance, slugs[0].name, slug)


# ===========================
# To attach it to your model:
# ===========================
#
# dispatcher.connect(_package_.slugify, signal=signals.pre_save, sender=_your_model_)

Comments

Archatas (on March 11, 2007):

The problem with your code is that it accesses the database as many times as a number of existing slugs with the same beginning. For example, if there are 100 objects created by different users and called "test", "test_1", ... "test_99", then 101 DB queries will be performed to get the next free slug. Other not-nice-to-have feature is that numbering starts from 0 (is not humanized).

I suggest you to integrate the call of the following more generic function for getting the unique value for the slug:

import re

def get_unique_value(model, proposal, field_name="slug", instance_pk=None, separator="-"):
    """ Returns unique string by the proposed one.
    Optionally takes:
    * field name which can  be 'slug', 'username', 'invoice_number', etc.
    * the primary key of the instance to which the string will be assigned.
    * separator which can be '-', '_', ' ', '', etc.
    By default, for proposal 'example' returns strings from the sequence:
        'example', 'example-2', 'example-3', 'example-4', ...
    """
    if instance_pk:
        similar_ones = model.objects.filter(**{field_name + "__startswith": proposal}).exclude(pk=instance_pk).values(field_name)
    else:
        similar_ones = model.objects.filter(**{field_name + "__startswith": proposal}).values(field_name)
    similar_ones = [elem[field_name] for elem in similar_ones]
    if proposal not in similar_ones:
        return proposal
    else:
        numbers = []
        for value in similar_ones:
            match = re.match(r'^%s%s(\d+)$' % (proposal, separator), value)
            if match:
                numbers.append(int(match.group(1)))
        if len(numbers)==0:
            return "%s%s2" % (proposal, separator)
        else:
            largest = sorted(numbers)[-1]
            return "%s%s%d" % (proposal, separator, largest + 1)

I could create a new snippet for that, but I think it's more useful to have it here in one place.

Example usage:

from django.contrib.auth.models import User
from myapp.models import Page
unique_username = get_unique_value(User, "john_smith", field_name="username", separator="_")
unique_slug = get_unique_value(Page, "about-me")

Aliquip (on March 11, 2007):

Thanks, you're absolutely right. I never was overly concerned with performance as I required my titles to be unique anyways, but of course, the less db access the better ;)

As for starting numbering at two, thus counting the first unnumbered slug.. That's something I never even considered, I just avoided to start numbering at zero.. Again, as there is already an unnumbered no 1, this might be the preferable way to deal with it ;) A shame I didn't think of that earlier, now I already have about 1200 potential articles inhumanly slugified(though I doubt more that 5 slugs actually overlap, thus no real problem) :)

ubernostrum (on March 13, 2007):

Also, on the subject of underscores versus hyphens: Google will treat a hyphen in a URL as a word separator, but not an underscore, which means that hyphenated slugs tend to yield better search placement.

Ciantic (on April 30, 2007):

I can't get this work with newforms, with oldforms it works just fine. But apparently newforms dispatching or something, makes all my slugs contain unicode stuff and the slugifier above does not work on duplicate cases and throws just IntegrityError...

e.g. if I print (for debugging demonstration) slugs[1] variable at the slugify function defined in snippet above I get u'Jyv\xe4skyl\xe4' but if I try to edit this thing from the admin (that is still oldforms) I get 'Jyv\xc3\xa4skyl\xc3\xa4' and it works.

To try out this error, try to save something with the form.save() and with duplicate key.

Ciantic (on April 30, 2007):

Here is the fix:

    if type(raw_data) == type(u''):
        raw_data = unicodedata.normalize('NFKD', raw_data).encode('ascii', 'ignore')
    else:
        # Just for oldforms:
        raw_data = unicodedata.normalize('NFKD', raw_data.decode('utf-8', 'replace')).encode('ascii', 'ignore')

This fix is neccessary to make it work with old- and newforms at the sametime (like admin)... (it was not working with newforms.)

Please login first before commenting.

Automatically slugify slug fields in your models

More like this

Comments