import gc
def queryset_iterator(queryset, chunksize=1000):
'''''
Iterate over a Django Queryset ordered by the primary key
This method loads a maximum of chunksize (default: 1000) rows in it's
memory at the same time while django normally would load all rows in it's
memory. Using the iterator() method only causes it to not preload all the
classes.
Note that the implementation of the iterator does not support ordered query sets.
'''
pk = 0
last_pk = queryset.order_by('-pk')[0].pk
queryset = queryset.order_by('pk')
while pk < last_pk:
for row in queryset.filter(pk__gt=pk)[:chunksize]:
pk = row.pk
yield row
gc.collect()
Comments
Interesting
#
Django does not load all rows in memory, but it caches the result while iterating over the result. At the end you have everything in memory (if you don't use .iterator()). For most cases this is no problem.
I had memory problems when looping over huge querysets. I solved them with this:
Check connection.queries is empty. settings.DEBUG==True will store all queries there. (Or replace the list with a dummy object, which does not store anything):
Use queryset.iterator() to disable the internal cache.
Use values_list() if you know you need only some values.
#