Best way to paginate a raw SQL query in a Django REST ListAPI view?

Question

I have a raw SQL query that I use to build a query set for a Django REST ListAPI view. It is along the lines of the following (please excuse the meaningless names):

class MyView(ListAPIView):
    serializer_class = MySerializer
    paginate_by = 10
    def get_queryset(self):
        params = {
            "uid": str(self.request.user.id),
            "param": str(self.kwargs['param'])
        }
        query = 'SELECT f.id ' \
            'FROM myapp_foo f, myapp_bar b ' \
            'WHERE b.foo_id = f.id AND ' \
            'b.param >= %(param)s AND ' \
            'f.dt_tm >= NOW() AND ' \
            '(SELECT COUNT(*) FROM myapp_baz z ' \
            'WHERE z.user_id = %(uid)s AND ' \
            'z.qux_id = f.qux_id) = 0 ' \
            'ORDER BY f.dt_tm;'
        return Foo.objects.raw(query, params)

This gives the error:

object of type 'RawQuerySet' has no len()

I would like to calculate a count with a similar SQL query and then use the LIMIT and OFFSET parameters to do the pagination. I have read some suggestions where the list items are counted to get the len but this does not seem satisfactory, as it would be inefficient unless there was a small LIMIT in the query (which in any case would defeat the purpose of having pagination).

Update: I just noticed paginate_by is pending deprecation.

To start with how would I add a count method to the returned object?

BitParser · Answer 1 · 2019-03-25T11:19:08.097

A more efficient solution than the other alternatives would be to write your own RawQuerySet replacement. I am showing the code below, but you can also access it as a gist here. It is absolutely not guaranteed to be error-free; nevertheless I'm using it in Django 1.11 on Python 3 (with PostgreSQL as the database; should work with MySQL too). Simply put, this class adds the corresponding LIMIT and OFFSET clauses to your raw SQL query. There's nothing crazy, just some simple string concatenation, so make sure not to include these clauses in your raw SQL query.

The class

from django.db import models
from django.db.models import sql
from django.db.models.query import RawQuerySet


class PaginatedRawQuerySet(RawQuerySet):
    def __init__(self, raw_query, **kwargs):
        super(PaginatedRawQuerySet, self).__init__(raw_query, **kwargs)
        self.original_raw_query = raw_query
        self._result_cache = None

    def __getitem__(self, k):
        """
        Retrieves an item or slice from the set of results.
        """
        if not isinstance(k, (slice, int,)):
            raise TypeError
        assert ((not isinstance(k, slice) and (k >= 0)) or
                (isinstance(k, slice) and (k.start is None or k.start >= 0) and
                 (k.stop is None or k.stop >= 0))), \
            "Negative indexing is not supported."

        if self._result_cache is not None:
            return self._result_cache[k]

        if isinstance(k, slice):
            qs = self._clone()
            if k.start is not None:
                start = int(k.start)
            else:
                start = None
            if k.stop is not None:
                stop = int(k.stop)
            else:
                stop = None
            qs.set_limits(start, stop)
            return qs

        qs = self._clone()
        qs.set_limits(k, k + 1)
        return list(qs)[0]

    def __iter__(self):
        self._fetch_all()
        return iter(self._result_cache)

    def count(self):
        if self._result_cache is not None:
            return len(self._result_cache)

        return self.model.objects.count()

    def set_limits(self, start, stop):
        limit_offset = ''

        new_params = tuple()
        if start is None:
            start = 0
        elif start > 0:
            new_params += (start,)
            limit_offset = ' OFFSET %s'
        if stop is not None:
            new_params = (stop - start,) + new_params
            limit_offset = 'LIMIT %s' + limit_offset

        self.params = self.params + new_params
        self.raw_query = self.original_raw_query + limit_offset
        self.query = sql.RawQuery(sql=self.raw_query, using=self.db, params=self.params)

    def _fetch_all(self):
        if self._result_cache is None:
            self._result_cache = list(super().__iter__())

    def __repr__(self):
        return '<%s: %s>' % (self.__class__.__name__, self.model.__name__)

    def __len__(self):
        self._fetch_all()
        return len(self._result_cache)

    def _clone(self):
        clone = self.__class__(raw_query=self.raw_query, model=self.model, using=self._db, hints=self._hints,
                               query=self.query, params=self.params, translations=self.translations)
        return clone

How to use it

A custom manager

I'm using the query set above through a custom manager:

class MyModelRawManager(models.Manager):
    def raw(self, raw_query, params=None, translations=None, using=None):
        if using is None:
            using = self.db
        return PaginatedRawQuerySet(raw_query, model=self.model, params=params, translations=translations, using=using)

    def my_raw_sql_method(some_arg):
        # set up your query and params
        query = 'your query'
        params = ('your', 'params', 'tuple')
        return self.raw(raw_query=query, params=params)

A custom pagination class

For completion, I'm also including a pagination class:

from rest_framework.pagination import PageNumberPagination


class MyModelResultsPagination(PageNumberPagination):
    """Fixed page-size pagination with 10 items."""
    page_size = 10
    max_page_size = 10

Your ListAPIView

class MyModelView(generics.ListAPIView):

    serializer_class = MyModelSerializer
    pagination_class = MyModelResultsPagination

    def get_queryset(self):
        return MyModel.raw_manager.my_raw_sql_method(some_arg)

A word of caution

The PaginatedRawQuerySet class, while functional for me, has not been extensively tested, but I believe it does give an idea of what it takes to have a solution which is more efficient than selecting all the items in your queryset for every call.

You may notice that there is a custom count method implementation (originally missing from RawQuerySet), which is calculated by calling self.model.objects.count(). Without this method, the paginator would evaluate len(your_raw_queryset), which would have the same effect on performance as the other answer.

This class is not a one-size-fits-all replacement for RawQuerySet, which means you should add your own customizations to make it fit your needs.

For example, if you need something more complicated, you could add another attribute to the PaginatedRawQuerySet class, called raw_count_query, which would then be called inside count() instead of counting all the objects the way it is right now (this would be used in cases you need filtering; the raw_count_query would provide the SQL to count the subset based on your conditions).

Pagination of raw queryset is already supported in Django 3.x, super impressed! — Dexter, Dec 04 '21 at 18:17

score 2 · Answer 2 · answered Aug 24 '15 at 22:40

2

If you cast the raw queryset to a list before returning it, that should prevent the 'RawQuerySet' has no len() error.

return list(Foo.objects.raw(query))

As you say, that will be inefficient, as it will load the entire queryset.

It might be possible to write a custom pagination class, which paginates efficiently using limit and offset, and use it in your view with the pagination_class attribute.

answered Aug 24 '15 at 22:40

Alasdair

298,606
55
578
516

I suppose using an upper LIMIT would restrict the size of the queryset. So far I have not been able to find a good example of custom pagination with raw SQL queries. – gornvix Aug 24 '15 at 23:20

score 1 · Answer 3 · answered Jul 14 '17 at 15:24

I had the same problem and I just found that instead use raw you may use extra :

(...)
return Foo.objects.extra(where=query, params=params)

extras variables

where=['data->>"$.SOMETHING" = %s OR data->>"$.SOMETHING" = %s OR data->>"$.SOMETHING" = %s', 'data->>"$.GROUP" LIKE %s'] 
params=['EX1', 'EX2', 'EX3', '%EXEMPLE4%']

Note : the main question is use a RawQuerySet with same properties of QuerySet the best way IMHO is use extra of QuerySet API , if is possible .