1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192 | """
ModelPagination
Designed and Coded by Cal Leeming
Many thanks to Harry Roberts for giving us a heads up on how to do this properly!
You may also notice the class is almost exactly the same as the django pagination, give or take :)
http://docs.djangoproject.com/en/dev/topics/pagination/?from=olddocs
So this means, in most cases, you can use this as a drop in replacement.
Although, if you are looking at using this, you would probably not just "drop it in" lol.
----------------------------------------------------------------------------
This is a super optimized way of paginating datasets over 1 million records.
It uses MAX() rather then COUNT(), because this is super faster.
EXAMPLE:
>>> _t = time.time(); x = Post.objects.aggregate(Max('id')); "Took %ss"%(time.time() - _t )
'Took 0.00103402137756s'
>>> _t = time.time(); x = Post.objects.aggregate(Count('id')); "Took %ss"%(time.time() - _t )
'Took 0.92404794693s'
>>>
This does mean that if you go deleting things, then the IDs won't be accurate,
so if you delete 50 rows, you're exact count() isn't going to match, but this is
okay for pagination, because for SEO, we want items to stay on the original page
they were scanned on. If you go deleting items, then the items shift backwards
through the pages, so you end up with inconsistent SEO on archive pages. If this
doesn't make sense, go figure it out for yourself, its 2am in the morning ffs ;p
Now, the next thing we do, is use id seeking, rather then OFFSET, because again,
this is a shitton faster:
EXAMPLE:
>>> _t = time.time(); x = map(lambda x: x, Post.objects.filter(id__gte=400000, id__lt=400500).all()); print "Took %ss"%(time.time() - _t)
Took 0.0467309951782s
>>> _t = time.time(); _res = map(lambda x: x, Post.objects.all()[400000:400500]); print "Took %ss"%(time.time() - _t)
Took 1.05785298347s
>>>
By using this seeking method (which btw, can be implemented on anything, not just pagination)
on a table with 5 million rows, we are saving 0.92s on row count, and 1.01s on item grabbing.
This may not seem like much, but if you have 1024 concurrent users, this will make a huge
difference.
If you have any questions or problems, feel free to contact me on
cal.leeming [at] simplicitymedialtd.co.uk
"""
from django.core.paginator import Paginator, InvalidPage, EmptyPage
from django.db.models import Max,Count,Q,F
class ModelPagination:
model = None
items_per_page = None
count = None
page_range = []
def __init__(self, model, items_per_page):
self.model = model
self.items_per_page = items_per_page
self.count = self.model.aggregate(Max('id'))['id__max']
self.num_pages = divmod(self.count, self.items_per_page)[0]+1
for i in range(self.num_pages):
self.page_range.append(i+1)
def page(self, page_number):
if page_number > self.num_pages:
raise EmptyPage, "That page contains no results"
if page_number <= 0:
raise EmptyPage, "That page number is less than 1"
start = self.items_per_page * (page_number-1)
end = self.items_per_page * page_number
object_list = self.model.filter(id__gte=start, id__lt=end)
return ModelPaginationPage(object_list, page_number, self.count, start, end, self)
class ModelPaginationPage:
object_list = None
number = None
count = None
start = None
end = None
paginator = None
def __unicode__(self):
return "<Page %s of %s>"%(self.number, self.count)
def __init__(self, object_list, number, count, start, end, paginator):
self.number = number
self.count = count
self.object_list = object_list
self.start = start
self.end = end
self.paginator = paginator
def has_next(self):
return False if self.number >= self.count else True
def has_previous(self):
return False if self.number <= 1 else True
def has_other_pages(self):
return True if self.has_next or self.has_previous else False
def next_number(self):
return self.number + 1
def previous_number(self):
return self.number + 1
def start_index(self):
return self.start
def end_index(self):
return self.end
###############################################################################
# OUR EXAMPLE USAGE
###############################################################################
def archive(request, *args, **kwargs):
_t = time.time()
# 4chan
if kwargs.get('feed') == '4chan':
ret = Post.objects
url = '/archive/4chan-page-'
else:
raise Exception, "Invalid feed specified"
# calculate what page we are on
page_num = int(args[0]) if args and args[0] else 1
# create the pagination object
_items_per_page = 1000
pagination = ModelPagination(Post.objects, 1000)
# extract the items from the page
page = pagination.page(page_num)
items = map(lambda x: {
'id' : x.get('id'),
'username' : x.get('username'),
'title' : make_title(x.get('message'), x.get('image_filename'), x.get('username')),
'url' : "/fcp/%s-%s.html"%(make_title(x.get('message'), x.get('image_filename'), x.get('username')), x.get('id')),
'partial_message' : x.get('message')[:256] if x.get('message') else None,
'created': x.get('created'),
'image_url' : x.get('image_url')
}, page.object_list.values('id', 'username', 'message', 'image_filename', 'created', 'image_url'))
context = RequestContext(request, {
'url' : url,
'page_num' : page_num,
'loading_time' : time.time() - _t,
'page' : page,
'items' : items,
'pagination' : pagination
})
return render_to_response('lazylittlegirl/archive/results.html', context_instance=context)
"""
<!-- Here is some example usage in a template, again this is just a copy and paste out of one of our projects, and not intended as a unit test or w/e -->
<div id="content">
<ol>
{% for item in items %}
<li class="li1">
<div class="box1">
<a href="{{item.url}}" alt="{{item.title}}" title="{{item.title}}" target="_blank">Post #{{item.id}}</a> - {{item.created}} by {{item.username}}
</div>
</li>
{% endfor %}
</ol>
<br />
<hr />
<div id="pagenumbers"><b>Pages :</b>
{% for xpage in pagination.page_range %}
{% if page.number == xpage %}
[<b>{{xpage}}</b>]
{% else %}
<a title="Page {{xpage}} of {{pagination.num_pages}}" alt="Page {{xpage}} of {{pagination.num_pages}}" href="{{url}}{{xpage}}.html">{{xpage}}</a>
{% endif %}
{% endfor %}
</div>
"""
|
Comments
What if the PK of models are not numeric? Like UUIDs? This still works?
#
Great snippet! Since you are focussing on performance, have you thought about using a list comp instead of map & lambda? generally maps are quicker, but when introducing lambdas, tend to fall behind.
<hr />you can accomplish the same thing with something like this
#
@qmandx: Sadly, because UUIDs are not numerically incremental, this code would definately not work. However, if you added a second column, as an unsigned int 11 primary key (called _id or id2 or something), then you could use this in place, and it'll work fine. If you delete data physically rather than flagging though, you can end up with pages having less items and others. Hope this makes sense.
@thurloat: Ah, I still haven't come to terms with the fact they are removing lambda, so haven't used the new recommend syntax ;( At some point though, I will definitely do some benchmarks between the two, in an attempt to convince myself to ditch lambda ;p Thank you for letting me know though!
#
I've added some example template to show how it would be used.. Similar to the docs :)
#
Please login first before commenting.