Faster pagination / model object seeking (10x faster infact :o) for larger datasets (500k +)

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
"""
    ModelPagination
    Designed and Coded by Cal Leeming
    Many thanks to Harry Roberts for giving us a heads up on how to do this properly!

    You may also notice the class is almost exactly the same as the django pagination, give or take :)
    http://docs.djangoproject.com/en/dev/topics/pagination/?from=olddocs
    So this means, in most cases, you can use this as a drop in replacement.
    Although, if you are looking at using this, you would probably not just "drop it in" lol.

    ----------------------------------------------------------------------------

    This is a super optimized way of paginating datasets over 1 million records.
    It uses MAX() rather then COUNT(), because this is super faster.

    EXAMPLE:
    >>> _t = time.time(); x = Post.objects.aggregate(Max('id')); "Took %ss"%(time.time() - _t )
    'Took 0.00103402137756s'
    >>> _t = time.time(); x = Post.objects.aggregate(Count('id')); "Took %ss"%(time.time() - _t )
    'Took 0.92404794693s'
    >>>

    This does mean that if you go deleting things, then the IDs won't be accurate,
    so if you delete 50 rows, you're exact count() isn't going to match, but this is
    okay for pagination, because for SEO, we want items to stay on the original page
    they were scanned on. If you go deleting items, then the items shift backwards
    through the pages, so you end up with inconsistent SEO on archive pages. If this
    doesn't make sense, go figure it out for yourself, its 2am in the morning ffs ;p

    Now, the next thing we do, is use id seeking, rather then OFFSET, because again,
    this is a shitton faster:

    EXAMPLE:
    >>> _t = time.time(); x = map(lambda x: x, Post.objects.filter(id__gte=400000, id__lt=400500).all()); print "Took %ss"%(time.time() - _t)
    Took 0.0467309951782s
    >>> _t = time.time(); _res = map(lambda x: x, Post.objects.all()[400000:400500]); print "Took %ss"%(time.time() - _t)
    Took 1.05785298347s
    >>>

    By using this seeking method (which btw, can be implemented on anything, not just pagination)
    on a table with 5 million rows, we are saving 0.92s on row count, and 1.01s on item grabbing.
    This may not seem like much, but if you have 1024 concurrent users, this will make a huge
    difference.

    If you have any questions or problems, feel free to contact me on
    cal.leeming [at] simplicitymedialtd.co.uk

"""
from django.core.paginator import Paginator, InvalidPage, EmptyPage
from django.db.models import Max,Count,Q,F

class ModelPagination:
    model = None
    items_per_page = None
    count = None
    page_range = []

    def __init__(self, model, items_per_page):
        self.model = model
        self.items_per_page = items_per_page
        self.count = self.model.aggregate(Max('id'))['id__max']
        self.num_pages = divmod(self.count, self.items_per_page)[0]+1

        for i in range(self.num_pages):
            self.page_range.append(i+1)

    def page(self, page_number):
        if page_number > self.num_pages:
            raise EmptyPage, "That page contains no results"

        if page_number <= 0:
            raise EmptyPage, "That page number is less than 1"

        start = self.items_per_page * (page_number-1)
        end = self.items_per_page * page_number

        object_list = self.model.filter(id__gte=start, id__lt=end)
        return ModelPaginationPage(object_list, page_number, self.count, start, end, self)

class ModelPaginationPage:
    object_list = None
    number = None
    count = None
    start = None
    end = None
    paginator = None

    def __unicode__(self):
        return "<Page %s of %s>"%(self.number, self.count)

    def __init__(self, object_list, number, count, start, end, paginator):
        self.number = number
        self.count = count
        self.object_list = object_list
        self.start = start
        self.end = end
        self.paginator = paginator

    def has_next(self):
        return False if self.number >= self.count else True

    def has_previous(self):
        return False if self.number <= 1 else True

    def has_other_pages(self):
        return True if self.has_next or self.has_previous else False

    def next_number(self):
        return self.number + 1

    def previous_number(self):
        return self.number + 1

    def start_index(self):
        return self.start

    def end_index(self):
        return self.end

###############################################################################
# OUR EXAMPLE USAGE
###############################################################################
def archive(request, *args, **kwargs):
    _t = time.time()

    # 4chan
    if kwargs.get('feed') == '4chan':
        ret = Post.objects
        url = '/archive/4chan-page-'

    else:
        raise Exception, "Invalid feed specified"

    # calculate what page we are on
    page_num = int(args[0]) if args and args[0] else 1

    # create the pagination object
    _items_per_page = 1000
    pagination = ModelPagination(Post.objects, 1000)
    
    # extract the items from the page
    page = pagination.page(page_num)

    items = map(lambda x: {
        'id' : x.get('id'),
        'username' : x.get('username'),
        'title' : make_title(x.get('message'), x.get('image_filename'), x.get('username')),
        'url' : "/fcp/%s-%s.html"%(make_title(x.get('message'), x.get('image_filename'), x.get('username')), x.get('id')),
        'partial_message' : x.get('message')[:256] if x.get('message') else None,
        'created': x.get('created'),
        'image_url' : x.get('image_url')

    }, page.object_list.values('id', 'username', 'message', 'image_filename', 'created', 'image_url'))

    context = RequestContext(request, {
        'url' : url,
        'page_num' : page_num,
        'loading_time' : time.time() - _t,
        'page' : page,
        'items' : items,
        'pagination' : pagination
    })

    return render_to_response('lazylittlegirl/archive/results.html', context_instance=context)


"""
<!-- Here is some example usage in a template, again this is just a copy and paste out of one of our projects, and not intended as a unit test or w/e -->
    <div id="content">
        <ol>
            {% for item in items %}
                <li class="li1">
                    <div class="box1">
                        <a href="{{item.url}}" alt="{{item.title}}" title="{{item.title}}" target="_blank">Post #{{item.id}}</a> - {{item.created}} by {{item.username}} 
                    </div>
                </li>
            {% endfor %}
        </ol>

   <br />
   <hr />
   
    <div id="pagenumbers"><b>Pages :</b>
        {% for xpage in pagination.page_range %}
            {% if page.number == xpage %}
                [<b>{{xpage}}</b>]
            {% else %}
                <a title="Page {{xpage}} of {{pagination.num_pages}}" alt="Page {{xpage}} of {{pagination.num_pages}}" href="{{url}}{{xpage}}.html">{{xpage}}</a>
            {% endif %}
        {% endfor %}
    </div>
"""

More like this

  1. Generating aggregate data across generic relations by coleifer 2 years, 12 months ago
  2. Pagination/Filtering Alphabetically by zain 4 years, 2 months ago
  3. Reshape list for table, flatten index in nested loops by aquagnu 5 years, 2 months ago
  4. CharField powered Tags with ChoiceField widget. by Husio 3 years, 9 months ago
  5. caching parsed templates by forgems 5 years, 5 months ago

Comments

gmandx (on November 29, 2010):

What if the PK of models are not numeric? Like UUIDs? This still works?

#

thurloat (on November 29, 2010):

Great snippet! Since you are focussing on performance, have you thought about using a list comp instead of map & lambda? generally maps are quicker, but when introducing lambdas, tend to fall behind.


you can accomplish the same thing with something like this

items = [{"id": x.get('id'),
   "username": x.get('username'),
   "title": make_title(x.get('message'), x.get('image_filename'), x.get('username')),
   "partial_message": x.get('message')[:256] if x.get('message') else None,
  } for x in page.object_list.values('id', 'username', 'message', 'image_filename')]

#

sleepycal (on December 1, 2010):

@qmandx: Sadly, because UUIDs are not numerically incremental, this code would definately not work. However, if you added a second column, as an unsigned int 11 primary key (called _id or id2 or something), then you could use this in place, and it'll work fine. If you delete data physically rather than flagging though, you can end up with pages having less items and others. Hope this makes sense.

@thurloat: Ah, I still haven't come to terms with the fact they are removing lambda, so haven't used the new recommend syntax ;( At some point though, I will definitely do some benchmarks between the two, in an attempt to convince myself to ditch lambda ;p Thank you for letting me know though!

#

sleepycal (on December 1, 2010):

I've added some example template to show how it would be used.. Similar to the docs :)

#

siblek31 (on April 13, 2013):

Liga Bangsa-Bangsa kursus bahasa inggris online Organisasi Kesehatan didirikan setelah Perang Dunia Pertama dalam Liga kerangka Bangsa. Menurut Kovenan Liga, itu adalah untuk "berusaha untuk mengambil langkah-langkah dalam masalah yang menjadi perhatian internasional untuk pencegahan dan pengendalian penyakit, bahkan dalam cara mendapatkan uang dari internet kasus kesulitan manusia mengerikan". [1] Its upaya terhambat oleh Perang Dunia Kedua, selama Bantuan yang PBB dan Administrasi Rehabilitasi juga memainkan peran dalam inisiatif kesehatan internasional. [2] Selama Konferensi PBB tentang Organisasi Internasional, referensi untuk kesehatan telah dimasukkan ke dalam Piagam PBB dan melewati pernyataan bahwa badan kesehatan internasional akan dibentuk. [3]

Pada bulan Februari 1946, Dewan Ekonomi dan Sosial Perserikatan Bangsa-Bangsa membantu rancangan konstitusi dari badan baru. [2] Penggunaan kata "dunia", daripada "internasional", menekankan sifat global dari apa yang organisasi itu berusaha untuk mencapai [2] Konstitusi Organisasi Kesehatan Dunia telah ditandatangani oleh semua 61 negara Perserikatan Bangsa-Bangsa dengan 22 Juli 1946.. Ini menjadi badan khusus pertama PBB yang setiap anggota berlangganan [3] Its konstitusi secara resmi mulai berlaku pada Hari Kesehatan Dunia pertama pada 7 April 1948, ketika diratifikasi oleh negara anggota ke-26. [4]. Pertemuan pertama Majelis Kesehatan Dunia selesai pada tanggal 24 Juli cara cepat hamil tahun 1948, setelah berhasil mendapat dana sebesar US $ 5 juta (kemudian GBP £ 1.250.000) untuk 1949 tahun. Andrija Stampar adalah presiden pertama Majelis, dan G. Brock Chisholm diangkat menjadi Direktur Jenderal WHO, soal ulangan sd setelah menjabat sebagai Sekretaris Eksekutif selama tahap perencanaan. [2] Prioritas pertamanya adalah untuk mengendalikan penyebaran here malaria, TBC dan infeksi menular seksual , dan untuk meningkatkan kesehatan ibu dan anak, gizi dan kebersihan lingkungan. Tindakan legislatif pertama adalah mengenai penyusunan statistik yang belajar bahasa inggris akurat mengenai penyebaran dan kursus teknisi komputer morbiditas penyakit [2] Logo dari Organisasi Kesehatan Dunia fitur Rod Asclepius sebagai simbol untuk penyembuhan. [5].

#

(Forgotten your password?)