21

I turned pagination on in Django Rest framework and it appears to be incredibly slow. Count looks like the culprit, and is taking hundreds of milliseconds to return each time due to the millions of rows in the tables.

I am using postgresql as the database. Is there any way to not count the rows and still use pagination? The performance was fine before this was enabled if I manually filtered the queryset.

Smills
  • 736
  • 1
  • 6
  • 17
  • Paste the code to make the question clearer. – Animesh Sharma Jul 31 '15 at 07:10
  • There's not a whole lot in the way of code to paste. The only thing I added that makes it slow is this modification to settings.py: 'DEFAULT_PAGINATION_CLASS': 'api.pagination.StandardResultsSetPagination' – Smills Jul 31 '15 at 07:14

6 Answers6

19

The issue is, that the query used to count is the same potentially complex one as used to fetch the data. That's rather wasteful. PageNumberPagination uses Django's own Paginator internally.

To make the query for the count simpler override the paginator class DRF uses:

from django.core.paginator import Paginator
from django.utils.functional import cached_property
from rest_framework.pagination import PageNumberPagination

class FasterDjangoPaginator(Paginator):
    @cached_property
    def count(self):
        # only select 'id' for counting, much cheaper
        return self.object_list.values('id').count()


class FasterPageNumberPagination(PageNumberPagination):
    django_paginator_class = FasterDjangoPaginator
Florian
  • 2,562
  • 5
  • 25
  • 35
  • Very good solution, thanks! Will make my app a bit faster everywhere :) – Corentin S. Oct 22 '19 at 10:20
  • 2
    Before using this everywhere, note that selecting an 'id' will change queries involving aggregation (the 'id', will be appended to whatever else you are grouping by). – Zev Feb 04 '22 at 17:52
  • After every new record, this needs to restart the application server to show the correct count. I think cached_property must not be used here. – Mirat Can Bayrak Jan 26 '23 at 13:50
16

Override the get_paginated_response method of your pagination class, and do not include the count. You can refer to the base implementation of the PageNumberPagination class to see what you should return.

from rest_framework.pagination import PageNumberPagination
from collections import OrderedDict # requires Python 2.7 or later

class PageNumberPaginationWithoutCount(PageNumberPagination):
    # Set any other options you want here like page_size

    def get_paginated_response(self, data):
        return Response(OrderedDict([
            ('next', self.get_next_link()),
            ('previous', self.get_previous_link()),
            ('results', data)
        ]))

Then in your settings.py, set DEFAULT_PAGINATION_CLASS to your new pagination class.

DEFAULT_PAGINATION_CLASS = 'path.to.PageNumberPaginationWithoutCount'

This approach is used in the example in the pagination docs.

Edit: from the comments below it sounds like this might not be enough to prevent the slow sql query, so you might need to override paginate_queryset as well.

Alasdair
  • 298,606
  • 55
  • 578
  • 516
  • 6
    "Override the get_paginated_response method of your pagination class, and do not include the count." Seconded, yup. You may also want to consider using the CursorPagination for large datasets, as it wont get slower the further through the user pages, unlike the page and offset styles. – Tom Christie Jul 31 '15 at 13:57
  • 5
    I tried that, and it removes the count from the result, but still runs it according to django debug toolkit. However, overriding paginate_queryset(self, queryset, request, view=None): And setting self.count = 0 instead of self.count = _get_count(queryset) sorted it. I also just disabled next and previous as i don't need them. – Smills Jul 31 '15 at 15:25
  • 3
    How did you override paginate_queryset ? – dranxo Dec 14 '15 at 17:18
6

If you are ok without count, next and previous links, Following custom class can be used.

import sys
from collections import OrderedDict

from django.core.paginator import Paginator
from django.utils.functional import cached_property
from rest_framework.pagination import PageNumberPagination
from rest_framework.response import Response


class CustomPaginatorClass(Paginator):
    @cached_property
    def count(self):
        return sys.maxsize


# To Avoid large table count query, We can use this paginator class
class LargeTablePagination(PageNumberPagination):
    django_paginator_class = CustomPaginatorClass

    def get_paginated_response(self, data):
        return Response(OrderedDict([
            ('page', self.page.number),
            ('results', data)
        ]))
xrage
  • 4,690
  • 4
  • 25
  • 31
  • PageNumberPagination doesn't seem to have a django_paginator_class though? Maybe it used to? This is on the right track though. – getup8 Jan 11 '20 at 01:05
1

This is an example of a paginator that keeps the next/previous links working. It fetches an extra row at the end to check whether there is another page:

class NoCountPagination(LimitOffsetPagination):
    def get_paginated_response(self, data):
        return Response(
            {
                "next": self.get_next_link(),
                "previous": self.get_previous_link(),
                "results": data,
            }
        )

    def paginate_queryset(self, queryset, request, view=None):
        self.offset = self.get_offset(request)
        self.limit = self.get_limit(request)

        # Get one extra element to check if there is a "next" page
        q = list(queryset[self.offset : self.offset + self.limit + 1])
        self.count = self.offset + len(q) if len(q) else self.offset - 1
        if len(q) > self.limit:
            q.pop()

        self.request = request
        if self.count > self.offset + self.limit and self.template is not None:
            self.display_page_controls = True

        return q

    def get_paginated_response_schema(self, schema):
        ret = super().get_paginated_response_schema(schema)
        del ret["properties"]["count"]
        return ret
Paolo Bonzini
  • 1,900
  • 15
  • 25
0

The other answers either didn't work for me or were still performing the extra COUNT(*) query.

This will get rid of all pagination, the count query, and will just return the JSON response:

from rest_framework.pagination import PageNumberPagination


class NoCountPagination(PageNumberPagination):
    page_size = None

    def get_paginated_response(self, data):
        return Response({
            'results', data
        })

To use it:

from rest_framework import viewsets
from .models import MyModel
from .serializers import MySerializer


class CustomApiViewSet(viewsets.ReadOnlyModelViewSet):
    """
    Simple viewset for viewing MyModels (as a list, or individually).
    """
    queryset = MyModel.objects.all()
    serializer_class = MySerializer
    pagination_class = NoCountPagination

Beware, this will return all rows from your queryset. In almost all cases, I think it's probably better to just use PageNumberPagination as-is or maybe use @Florian's solution above to speed it up a bit.

getup8
  • 6,949
  • 1
  • 27
  • 31
0

Adding to getup8's response, I was able to get it working where it doesn't return the count but also doesn't return all of the rows (assuming you have PAGE_SIZE set to a reasonable number in your site's REST_FRAMEWORK settings).

from rest_framework.pagination import LimitOffsetPagination
    
class NoCountPaginator(LimitOffsetPagination):
    def get_count(self, queryset):
        return 99999999

    def get_paginated_response(self, data):
        return Response(OrderedDict([
            ('results', data)
        ]))

And to use it:

from rest_framework import viewsets
from .models import MyModel
from .serializers import MySerializer


class CustomApiViewSet(viewsets.ReadOnlyModelViewSet):
    queryset = MyModel.objects.all()
    serializer_class = MySerializer
    pagination_class = NoCountPaginator

I had PAGE_SIZE set to 25, so the API always returns only the top 25 rows and it no longer runs the count query.

Scott Weaver
  • 51
  • 1
  • 2