Login

streaming serializer

Author:
kcarnold
Posted:
March 29, 2009
Language:
Python
Version:
1.0
Tags:
dumpdata large memoryerror
Score:
1 (after 1 ratings)

Trying ./manage.py dumpdata on a huge database and getting MemoryErrors? Here's part of your solution.

Snippet 1400 provides a queryset_foreach utility that we've found very useful. This snippet uses it on a serializer that can output to a stream, such as the XML serializer.

Management command coming momentarily...

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
from snippet 1400 import queryset_foreach

def serialize_qs4e(serializer, querysets, stream, **options):
    qs4e_options = {'transaction': False, 'batch_size': 50}
    for opt in ['batch_size', 'progress_callback', 'transaction']:
        val = options.pop(opt, None)
        if val is not None: qs4e_options[opt] = val

    serializer.options = options
    serializer.options['stream'] = stream
    serializer.stream = stream
    serializer.selected_fields = options.get("fields")

    def serialize_object(obj):
        #import pdb; pdb.set_trace()

        serializer.start_object(obj)
        for field in obj._meta.local_fields:
            if field.serialize:
                if field.rel is None:
                    if serializer.selected_fields is None or field.attname in serializer.selected_fields:
                        serializer.handle_field(obj, field)
                else:
                    if serializer.selected_fields is None or field.attname[:-3] in serializer.selected_fields:
                        serializer.handle_fk_field(obj, field)
        for field in obj._meta.many_to_many:
            if field.serialize:
                if serializer.selected_fields is None or field.attname in serializer.selected_fields:
                    serializer.handle_m2m_field(obj, field)
        serializer.end_object(obj)

    serializer.start_serialization()
    for queryset in querysets:
        queryset_foreach(queryset, serialize_object, **qs4e_options)
    serializer.end_serialization()

More like this

  1. streaming dump_data by kcarnold 5 years, 11 months ago
  2. Serializing booleans correctly when doing dumpdata from a MySQL database using Django 0.96 by chrj 6 years, 7 months ago
  3. CSV serializer by stringify 4 years, 4 months ago
  4. Improved YAML serializer for large databases by rspeer 5 years, 10 months ago
  5. GeoJSON Serializer for GeoDjango (gis) by danielsokolowski 3 years, 9 months ago

Comments

Please login first before commenting.