Login

Model merging function

Author:
xaralis
Posted:
December 1, 2010
Language:
Python
Version:
1.2
Score:
2 (after 3 ratings)

Generic function to merge model instances. Useful when you need to merge duplicate models together, e.g. for users.

Based on http://djangosnippets.org/snippets/382/, with several enhancements:

  • Type checking: only Model subclasses can be used and testing that all instances are of same model class
  • Handles symmetrical many-to-may: original snippet failed in that case
  • Filling up blank attrs of original when duplicate has it filled
  • Prepared to use outside of command-line
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
    from django.db import transaction
    from django.db.models import get_models, Model
    from django.contrib.contenttypes.generic import GenericForeignKey
    
    @transaction.commit_on_success
    def merge_model_objects(primary_object, alias_objects=[], keep_old=False):
        """
        Use this function to merge model objects (i.e. Users, Organizations, Polls,
        etc.) and migrate all of the related fields from the alias objects to the
        primary object.
        
        Usage:
        from django.contrib.auth.models import User
        primary_user = User.objects.get(email='[email protected]')
        duplicate_user = User.objects.get(email='[email protected]')
        merge_model_objects(primary_user, duplicate_user)
        """
        if not isinstance(alias_objects, list):
            alias_objects = [alias_objects]
        
        # check that all aliases are the same class as primary one and that
        # they are subclass of model
        primary_class = primary_object.__class__
        
        if not issubclass(primary_class, Model):
            raise TypeError('Only django.db.models.Model subclasses can be merged')
        
        for alias_object in alias_objects:
            if not isinstance(alias_object, primary_class):
                raise TypeError('Only models of same class can be merged')
        
        # Get a list of all GenericForeignKeys in all models
        # TODO: this is a bit of a hack, since the generics framework should provide a similar
        # method to the ForeignKey field for accessing the generic related fields.
        generic_fields = []
        for model in get_models():
            for field_name, field in filter(lambda x: isinstance(x[1], GenericForeignKey), model.__dict__.iteritems()):
                generic_fields.append(field)
                
        blank_local_fields = set([field.attname for field in primary_object._meta.local_fields if getattr(primary_object, field.attname) in [None, '']])
        
        # Loop through all alias objects and migrate their data to the primary object.
        for alias_object in alias_objects:
            # Migrate all foreign key references from alias object to primary object.
            for related_object in alias_object._meta.get_all_related_objects():
                # The variable name on the alias_object model.
                alias_varname = related_object.get_accessor_name()
                # The variable name on the related model.
                obj_varname = related_object.field.name
                related_objects = getattr(alias_object, alias_varname)
                for obj in related_objects.all():
                    setattr(obj, obj_varname, primary_object)
                    obj.save()
    
            # Migrate all many to many references from alias object to primary object.
            for related_many_object in alias_object._meta.get_all_related_many_to_many_objects():
                alias_varname = related_many_object.get_accessor_name()
                obj_varname = related_many_object.field.name
                
                if alias_varname is not None:
                    # standard case
                    related_many_objects = getattr(alias_object, alias_varname).all()
                else:
                    # special case, symmetrical relation, no reverse accessor
                    related_many_objects = getattr(alias_object, obj_varname).all()
                for obj in related_many_objects.all():
                    getattr(obj, obj_varname).remove(alias_object)
                    getattr(obj, obj_varname).add(primary_object)
    
            # Migrate all generic foreign key references from alias object to primary object.
            for field in generic_fields:
                filter_kwargs = {}
                filter_kwargs[field.fk_field] = alias_object._get_pk_val()
                filter_kwargs[field.ct_field] = field.get_content_type(alias_object)
                for generic_related_object in field.model.objects.filter(**filter_kwargs):
                    setattr(generic_related_object, field.name, primary_object)
                    generic_related_object.save()
                    
            # Try to fill all missing values in primary object by values of duplicates
            filled_up = set()
            for field_name in blank_local_fields:
                val = getattr(alias_object, field_name) 
                if val not in [None, '']:
                    setattr(primary_object, field_name, val)
                    filled_up.add(field_name)
            blank_local_fields -= filled_up
                
            if not keep_old:
                alias_object.delete()
        primary_object.save()
        return primary_object

More like this

  1. Template tag - list punctuation for a list of items by shapiromatron 1 year ago
  2. JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 1 year ago
  3. Serializer factory with Django Rest Framework by julio 1 year, 7 months ago
  4. Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 8 months ago
  5. Help text hyperlinks by sa2812 1 year, 8 months ago

Comments

NicholasMerrill (on December 4, 2014):

I made a slight modification to handle one-to-one fields, recursively merging related one-to-one objects as well.

#

YPCrumble (on December 16, 2017):

This was a big help to me in creating the Django Extensions' merge_model_instances management command. Thanks for posting!

After writing the code for the extension above I also found Django Super Deduper which might be of help to others looking to merge models.

#

Please login first before commenting.