I once needed to convert a Django project from PostgreSQL to SQLite. At that time I was either unaware of manage.py dumpdata/loaddata or it they didn't yet exist. I asked for advice on the #django IRC channel where ubernostrum came up with this plan:
simple process:
1) Select everything.
2) Pickle it.
3) Save to file.
4) Read file.
5) Unpickle.
6) Save to db.
:)
Or something like that.
First I thought it was funny, but then started to think about it and it made perfect sense. And so dbpickle.py was born.
I've used this script also for migrating schema changes to production databases.
For migration you can write plugins to hook on dbpickle.py's object retrieval and saving. This way you can add/remove/rename fields of objects on the fly when loading a dumped database. It's also possible to populate new fields with default values or even values computed based on the object's other properties.
A good way to use this is to create a database migration plugin for each schema change and use it with dbpickle.py to migrate the project.
See also original blog posting and my usenet posting wondering about the feasibility of this functionality with manage.py dumpdata/loaddata.
See trac site for version history.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | #!/usr/bin/python
# -*- encoding: utf-8 -*-
"""
dbpickle.py is a Django tool for saving all database objects into a
file or loading and storing them back into the database. It's useful
for migrating from one database engine to another.
This version handles ForeignKeys which refer to the related model's
primary key. ManyToMany relations are now supported, but OneToOne
relations are not.
You can also use you own plugin for pre-processing objects before
saving them into the database. This is useful e.g. when a model has
changed and you're importing objects from an old version of the
database. The plugin should be a Python program with a
``pre_save_hook(obj)`` and/or ``pre_walk_hook(obj)`` function which
returns True if it modified the object, otherwise False.
Usage example
=============
Let's suppose your project is using PostgreSQL and the database has
been populated using the app.
$ PYTHONPATH=/home/myproject DJANGO_SETTINGS_MODULE=settings \
./dbpickle.py --dump --file=myproject.pickle
At this point you could change the settings so that SQLite is used
instead of PostgreSQL.
$ PYTHONPATH=/home/myproject DJANGO_SETTINGS_MODULE=settings \
/home/myproject/manage.py syncdb
(answer 'no' when asked about creating a superuser)
$ PYTHONPATH=/home/myproject DJANGO_SETTINGS_MODULE=settings \
./dbpickle.py --load --file=myproject.pickle
This would effectively convert your database from PostgreSQL to SQLite
through Django.
"""
from django.db import transaction
import cPickle
import logging
from django.db import models
from imp import load_source
try:
set # Only available in Python 2.4+
except NameError:
from sets import Set as set # Python 2.3 fallback
def dump(filepath):
"""
Pickle all Django objects from the database and save them into the
given file.
"""
objects = {} # model instances
m2m_lists = [] # ManyToMany relations between model instances
for model in models.get_models():
meta = model._meta
app, model_name = meta.app_label, meta.module_name
# get all many-to-many relation field names for this model
m2ms = [m2m.name for m2m in meta.many_to_many]
for obj in model.objects.all():
logging.info('dumping %s.%s %s' % (app, model_name, obj))
pk = obj._get_pk_val()
objects[app, model_name, pk] = obj
for m2m in m2ms:
# store many-to-many related objects for every
# many-to-many relation of this object
foreign_objs = getattr(obj, m2m).all()
logging.info('dumping %s.%s.%s x %d' % (app, model_name, m2m, len(foreign_objs)))
m2m_lists.append((obj, m2m, tuple(foreign_objs)))
# pickle all objects and many-to-many relations on disk
cPickle.dump((objects, m2m_lists), file(filepath, 'w'))
#@transaction.commit_on_success
def load(filepath, pluginpath=None):
"""
Unpickle Django objects from the given file and save them into the
database.
"""
# load the plugin if specified on the command line
if pluginpath:
plugin = load_source('plugin', pluginpath)
else:
plugin = None
# get the hook functions from the plugin
hooks = {}
for hookname in 'pre_save', 'pre_walk':
hooks[hookname] = getattr(plugin, '%s_hook' % hookname, lambda obj: False)
# unpickle objects and many-to-many relations from disk
objects, m2m_lists = cPickle.load(file(filepath))
# Find distinct models of all unpickled objects and delete all
# objects before loading. Note that models which have not been
# dumped are not emptied.
models = set( [obj.__class__ for obj in objects.itervalues()] )
for model in models:
for obj in model._default_manager.all():
obj.delete()
# load all objects
while objects:
key, obj = objects.popitem()
load_recursive(objects, obj, hooks)
# load all many-to-many relations
for obj1, m2m, foreign_objs in m2m_lists:
meta1 = obj1._meta
for obj2 in foreign_objs:
meta2 = obj2._meta
logging.info('loading ManyToMany %s.%s.%s -> %s.%s.%s' % (
meta1.app_label, meta1.module_name, obj1._get_pk_val(),
meta2.app_label, meta2.module_name, obj2._get_pk_val()))
getattr(obj1, m2m).add(obj2)
load = transaction.commit_on_success(load)
def load_recursive(objects, obj, hooks):
"""
Save the given object into the database. If the object has
ForeignKey relations to other objects, first make sure they are
already saved (and repeat recursively).
"""
meta = obj._meta
hooks['pre_walk'](obj)
for field in meta.fields:
if isinstance(field, models.ForeignKey):
related_meta = field.rel.to._meta
related_app = related_meta.app_label
related_model = related_meta.module_name
related_pk_val = getattr(obj, field.name+'_id')
try:
related_obj = objects.pop((related_app,
related_model,
related_pk_val))
load_recursive(objects, related_obj, hooks)
except KeyError:
logging.debug('probably loaded already: '
'%(related_app)s.%(related_model)s '
'%(related_pk_val)s' % locals())
logging.info('loading %s.%s %s' % (
meta.app_label,
meta.module_name,
obj._get_pk_val()))
try:
hooks['pre_save'](obj)
obj.save()
except Exception, e:
logging.error('%s while saving %s' % (e, obj))
if __name__ == '__main__':
from optparse import OptionParser
p = OptionParser()
p.add_option('-d', '--dump', action='store_const', const='dump', dest='action', help='Dump all Django objects into a file')
p.add_option('-l', '--load', action='store_const', const='load', dest='action', help='Load all Django objects from a file')
p.add_option('-f', '--file', default='djangodb.pickle', help='Specify file path [djangodb.pickle]')
p.add_option('-p', '--plugin', help='Use .py plugin for preprocessing objects to load')
p.add_option('-v', '--verbose' , action='store_const', const=logging.DEBUG, dest='loglevel', help='Show verbose output for debugging')
p.add_option('-q', '--quiet' , action='store_const', const=logging.FATAL, dest='loglevel', help='No output at all')
(opts, args) = p.parse_args()
loglevel = opts.loglevel or logging.INFO
try:
# Python 2.4+ syntax
logging.basicConfig(level=loglevel, format='%(levelname)-8s %(message)s')
except TypeError:
# Python 2.3
logging.basicConfig()
if opts.action == 'dump':
dump(opts.file)
elif opts.action == 'load':
load(opts.file, opts.plugin)
else:
print 'Please specify --dump or --load'
|
More like this
- Template tag - list punctuation for a list of items by shapiromatron 10 months, 1 week ago
- JSONRequestMiddleware adds a .json() method to your HttpRequests by cdcarter 10 months, 2 weeks ago
- Serializer factory with Django Rest Framework by julio 1 year, 5 months ago
- Image compression before saving the new model / work with JPG, PNG by Schleidens 1 year, 6 months ago
- Help text hyperlinks by sa2812 1 year, 6 months ago
Comments
This is excellent! Thanks for the help! I migrated in reverse, from SQLite3 to Postgres, and this worked like a charm!
#
I was able to migrate my data from MySQL -> SQLite using this method, no problem. But I'm getting all kinds of transaction errors when I try to load into "postgresql_psycopg2". Any ideas what I'm screwing up?
#
Yesem,
I haven't had a need for dbpickle.py for a long time now, so I really can't tell right away what could be the reason for the aborted transaction on PostgreSQL. I did load data on PostgreSQL when I created dbpickle, and had no issues back then.
It might be that something has changed in the Django ORM, or you might have a data structure which didn't exist in my data.
I got another private message with a similar report. I'll look into this. If you have a smallish database which causes this problem and which you could share with me, please contact me so I can take a look. My email is akaihol plus-sign django at-sign ambitone dot com.
#
Yesem,
I had transaction errors too but they were caused by strings too big for the data model. Apparently neither django nor sqlite3 checked for those but postgresql did. So I updated the data model to accommodate the data and the load operation succeeded. Then I had to update the sequences in the database as the script did not do it. I did : alter sequence xxx_id_seq restart with yyy; Where I obtain yyy like this: select max(id) from xxx;
After that, my application runs fine with postgresql.
#
Please login first before commenting.