Login

Django chunked queryset iterator

Author:
mingdongt
Posted:
January 9, 2017
Language:
Python
Version:
Not specified
Tags:
django python database queryset iterator memoryerror
Score:
1 (after 1 ratings)

The function slices a queryset into smaller querysets containing chunk_size objects and then yield them. It is used to avoid memory error when processing huge queryset, and also database error due to that the database pulls whole table at once. Concurrent database modification wouldn't make some entries repeated or skipped in this process.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def chunked_queryset(queryset, chunk_size):
    """ Slice a queryset into chunks. """

    start_pk = 0
    queryset = queryset.order_by('pk')

    while True:
        # No entry left
        if not queryset.filter(pk__gt=start_pk).exists():
            break

        try:
            # Fetch chunk_size entries if possible
            end_pk = queryset.filter(pk__gt=start_pk).values_list(
                'pk', flat=True)[chunk_size - 1]

            # Fetch rest entries if less than chunk_size left
        except IndexError:
            end_pk = queryset.values_list('pk', flat=True).last()

        yield queryset.filter(pk__gt=start_pk).filter(pk__lte=end_pk)

        start_pk = end_pk

More like this

  1. Automatically setup raw_id_fields ForeignKey & OneToOneField by agusmakmun 5 months ago
  2. Crispy Form by sourabhsinha396 5 months, 3 weeks ago
  3. ReadOnlySelect by mkoistinen 6 months, 1 week ago
  4. Verify events sent to your webhook endpoints by santos22 7 months, 1 week ago
  5. Django Language Middleware by agusmakmun 7 months, 2 weeks ago

Comments

Please login first before commenting.