Login

streaming dump_data

Author:
kcarnold
Posted:
March 29, 2009
Language:
Python
Version:
1.0
Tags:
dumpdata memoryerror stream dump_data queryset_foreach
Score:
2 (after 2 ratings)

dumpdata without MemoryErrors, with progress notification. Most of the real work is done by snippets 1400 and 1401.

./manage.py dumpdata_stream --format=xml > big_dump.xml

This is basically the stock Django dumpdata with a few modifications. Django devs: it's hard to reuse parts of most Django management commands. A little refactoring could go a long way.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
from snippet 1401 import serialize_qs4e

from django.core.exceptions import ImproperlyConfigured
from django.core.management.base import BaseCommand, CommandError
from django.core import serializers
from django.utils.datastructures import SortedDict
import logging
import sys

from optparse import make_option

class Command(BaseCommand):
    option_list = BaseCommand.option_list + (
        make_option('--format', default='json', dest='format',
            help='Specifies the output serialization format for fixtures.'),
        make_option('--indent', default=None, dest='indent', type='int',
            help='Specifies the indent level to use when pretty-printing output'),
        make_option('-e', '--exclude', dest='exclude',action='append', default=[],
            help='App to exclude (use multiple --exclude to exclude multiple apps).'),
    )
    help = 'Output the contents of the database as a fixture of the given format. Streams to avoid MemoryErrors.'
    args = '[appname ...]'

    def handle(self, *app_labels, **options):
        from django.db.models import get_app, get_apps, get_models, get_model

        format = options.get('format','json')
        indent = options.get('indent',None)
        exclude = options.get('exclude',[])
        show_traceback = options.get('traceback', False)

        logging.basicConfig(level=logging.INFO)

        excluded_apps = [get_app(app_label) for app_label in exclude]

        if len(app_labels) == 0:
            app_list = SortedDict([(app, None) for app in get_apps() if app not in excluded_apps])
        else:
            app_list = SortedDict()
            for label in app_labels:
                try:
                    app_label, model_label = label.split('.')
                    try:
                        app = get_app(app_label)
                    except ImproperlyConfigured:
                        raise CommandError("Unknown application: %s" % app_label)

                    model = get_model(app_label, model_label)
                    if model is None:
                        raise CommandError("Unknown model: %s.%s" % (app_label, model_label))

                    if app in app_list.keys():
                        if app_list[app] and model not in app_list[app]:
                            app_list[app].append(model)
                    else:
                        app_list[app] = [model]
                except ValueError:
                    # This is just an app - no model qualifier
                    app_label = label
                    try:
                        app = get_app(app_label)
                    except ImproperlyConfigured:
                        raise CommandError("Unknown application: %s" % app_label)
                    app_list[app] = None

        # Check that the serialization format exists; this is a shortcut to
        # avoid collating all the objects and _then_ failing.
        if format not in serializers.get_public_serializer_formats():
            raise CommandError("Unknown serialization format: %s" % format)

        try:
            serializer_class = serializers.get_serializer(format)
        except KeyError:
            raise CommandError("Unknown serialization format: %s" % format)

        def get_querysets():
            for app, model_list in app_list.items():
                if model_list is None:
                    model_list = get_models(app)

                for model in model_list:
                    logging.info('Dumping model %s' % model)
                    yield model._default_manager.order_by(model._meta.pk.name)

        try:
            serialize_qs4e(serializer_class(), get_querysets(), stream=sys.stdout, indent=indent)
        except Exception, e:
            if show_traceback:
                raise
            raise CommandError("Unable to serialize database: %s" % e)

More like this

  1. streaming serializer by kcarnold 5 years, 11 months ago
  2. dumpdata/loaddata with MySQL and ForeignKeys, as django command by brondsem 5 years, 10 months ago
  3. dumpdata/loaddata with MySQL and ForeignKeys by cmgreen 7 years, 2 months ago
  4. Making templatetags global to all templates by pryankster 7 years, 11 months ago
  5. utf8-friendly dumpdata management command (no escape symbols) #3 by inductor 1 year, 10 months ago

Comments

rage3 (on April 25, 2009):

Some help on how to use the snippets would be neat for newbies. Can I just pack all 3 snippets in one file and overwrite dumpdata.py? I tried, but keep getting this error:

manage.py dumpdata --format=xml > dump.xml

INFO:root:Dumping model [HTML_REMOVED]

INFO:root:qs4e: Getting list of objects

Error: Unable to serialize database: float division

#

wiz (on November 17, 2009):

saved my day!

#

pcollins (on April 15, 2010):

@rage3

To fix the float division error go modify the Queryset Foreach (snippet 1400) file at line 32 where it says return self.cur_idx / ....

put that in a try except ZeroDivisionError block and in the exception handler put return 0

That should fix it and get it working.

#

Please login first before commenting.