1

I'd like to use the following raw query in a ListAPIView in the Django REST Framework. I'm not sure how to use a raw SQL query instead of a queryset to support pagination. I've looked into it and didn't find much that made sense to me.

If I want to use a query set, it would look like this How do I do this if I want to use raw SQL queries?

class VideoListView(generics.ListAPIView):
    queryset = Video.objects.raw(...)
    serializer_class = VideoSerializer
SELECT DISTINCT "t"."id",
                "t"."title",
                "t"."thumbnail_url",
                "t"."preview_url",
                "t"."embed_url",
                "t"."duration",
                "t"."views",
                "t"."is_public",
                "t"."published_at",
                "t"."created_at",
                "t"."updated_at",
                EXISTS
  (SELECT (1) AS "a"
   FROM "videos_history" U0
   WHERE (U0."user_id" IS NULL
          AND U0."video_id" = "t"."id")
   LIMIT 1) AS "is_viewed",
                EXISTS
  (SELECT (1) AS "a"
   FROM "videos_favorite" U0
   WHERE (U0."user_id" IS NULL
          AND U0."video_id" = "t"."id")
   LIMIT 1) AS "is_favorited",
                EXISTS
  (SELECT (1) AS "a"
   FROM "videos_track" U0
   INNER JOIN "videos_playlist" U1 ON (U0."playlist_id" = U1."id")
   WHERE (U1."is_wl"
          AND U1."user_id" IS NULL
          AND U0."video_id" = "t"."id")
   LIMIT 1) AS "is_wl"
FROM (
  (SELECT "videos_video".*
   FROM "videos_video"
   WHERE ("videos_video"."is_public"
      AND "videos_video"."published_at" <= '2022-01-03 05:20:16.725884+00:00'
      AND "videos_video"."title" LIKE '%word')
   ORDER BY "videos_video"."published_at" DESC
   LIMIT 20)
UNION
  (SELECT "videos_video".*
   FROM "videos_video"
   LEFT OUTER JOIN "videos_video_tags" ON ("videos_video"."id" = "videos_video_tags"."video_id")
   LEFT OUTER JOIN "videos_tag" ON ("videos_video_tags"."tag_id" = "videos_tag"."id")
   WHERE ("videos_video"."is_public"
      AND "videos_video"."published_at" <= '2022-01-03 05:20:16.725884+00:00'
      AND "videos_tag"."name" LIKE '%word')
   ORDER BY "videos_video"."published_at" DESC
   LIMIT 20)
) AS t
ORDER BY "t"."published_at" DESC
LIMIT 20;

Do I need to create a custom paginator or something? I'm even more confused because I'm using LIMIT and other methods for subqueries.

--- Add models.py, etc... ---

# models.py
class Tag(models.Model):
    name = models.CharField(unique=True, max_length=30)
    created_at = models.DateTimeField(default=timezone.now)
    ...


class Video(models.Model):
    title = models.CharField(max_length=300)
    tags = models.ManyToManyField(Tag, blank=True)
    updated_at = models.DateTimeField(auto_now=True)
    ...


class History(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
    video = models.ForeignKey(Video, on_delete=models.CASCADE)
    ...


class Favorite(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE
    video = models.ForeignKey(Video, on_delete=models.CASCADE)
    ...


class Playlist(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
    is_wl = models.BooleanField(default=False, editable=False)
    ...


class Track(models.Model):
    playlist = models.ForeignKey(Playlist, on_delete=models.CASCADE, null=True)
    video = models.ForeignKey(Video, on_delete=models.CASCADE)
    ...

The slow queryset and SQL queries that were originally used.

Video.objects.annotate(
    is_viewed=Exists(History.objects.filter(user=user, video=OuterRef("pk"))),
    is_favorited=Exists(
        Favorite.objects.filter(user=user, video=OuterRef("pk"))
    ),
    is_wl=Exists(
        Track.objects.filter(
            playlist__user=user, playlist__is_wl=True, video=OuterRef("pk")
        )
    ),
).filter(
    Q(title__contains=value)
    | Q(tags__name__contains=value)),
    is_public=True,
    published_at__lte=timezone.now(),
).order_by("-published_at").distinct()
SELECT DISTINCT "videos_video"."id",
                "videos_video"."published_at",
                EXISTS
  (SELECT (1) AS "a"
   FROM "videos_history" U0
   WHERE (U0."user_id" IS NULL
          AND U0."video_id" = "videos_video"."id")
   LIMIT 1) AS "is_viewed",
                EXISTS
  (SELECT (1) AS "a"
   FROM "videos_favorite" U0
   WHERE (U0."user_id" IS NULL
          AND U0."video_id" = "videos_video"."id")
   LIMIT 1) AS "is_favorited",
                EXISTS
  (SELECT (1) AS "a"
   FROM "videos_track" U0
   INNER JOIN "videos_playlist" U1 ON (U0."playlist_id" = U1."id")
   WHERE (U1."is_wl"
          AND U1."user_id" IS NULL
          AND U0."video_id" = "videos_video"."id")
   LIMIT 1) AS "is_wl"
FROM "videos_video"
LEFT OUTER JOIN "videos_video_tags" ON ("videos_video"."id" = "videos_video_tags"."video_id")
WHERE ("videos_video"."is_public"
       AND "videos_video"."published_at" <= '2021-12-27 13:34:29.103369+00:00'
       AND ("videos_video"."title" &@~ 'word'
            OR "videos_video_tags"."tag_id" IN
              (SELECT U0."id"
               FROM "videos_tag" U0
               WHERE U0."name" &@~ 'word')))
ORDER BY "videos_video"."published_at" DESC
LIMIT 20;
Jvn
  • 425
  • 4
  • 9
  • Why do you use a raw query in the first place? Django's pagination will work on a `QuerySet` by using `OFFSET` and `LIMIT`, and on a list, but when using a list, it will first fetch *all* records, hence one loses the potential efficiency gains by doing pagination on the database level. – Willem Van Onsem Jan 06 '22 at 17:21
  • The SQL generated by the ORM was very slow, so I rewrote the SQL myself. However, I don't know how to reproduce this SQL query in Queryset. – Jvn Jan 06 '22 at 17:24
  • can you provide the `Video`, `History`, Favorite`, etc. models? – Willem Van Onsem Jan 06 '22 at 17:27
  • I omitted a few things, but I added models.py and the slow query set and SQL query that I originally used. – Jvn Jan 06 '22 at 17:34
  • What if you use `is_viewed=Exists(History.objects.filter(user=user, video=OuterRef("pk"))[:1])` for the `Exists` queries, and aded `[:20]` at thend, thus adding limits like in your raw query. – Willem Van Onsem Jan 06 '22 at 18:26
  • I think I need to use union in the FROM clause, how do I do that in the queryset?There is `.union()`, but it is UNIONed in the WHERE clause. – Jvn Jan 07 '22 at 03:45
  • no, the union is just a trick to select items that have the given title, or have a tag which contains `word`. – Willem Van Onsem Jan 07 '22 at 10:02

1 Answers1

0

If you are asking about generic PageNumberPagination from rest_framework.pagination, usually it's done this way. You make your own pagination class with default values:

class VideoPagination(PageNumberPagination):
    max_page_size = 100
    page_size_query_param = 'page_size'
    page_size = 25

Then you just add it to your view:

class VideoListView(generics.ListAPIView):
    queryset = Video.objects.all()
    serializer_class = VideoSerializer
    pagination_class = VideoPagination

That's all, you can use parameters page and page_size.

Yevgeniy Kosmak
  • 3,561
  • 2
  • 10
  • 26
  • I apologize for the poor explanation. I need to use the raw SQL query mentioned in the question in `objects.raw()` to queryset, and I don't know how to combine `objects.raw()` with DRF pagination. – Jvn Jan 06 '22 at 17:44
  • @Jvn then you need [this](https://stackoverflow.com/a/43921793/12914274). – Yevgeniy Kosmak Jan 06 '22 at 17:54
  • Thanks. This seems quite old, does Django currently not have such a feature as standard? – Jvn Jan 07 '22 at 08:28
  • As far as I can see, there isn’t one. However, it’s not surprising for me. This task is quite rare since most DB requests could be made with ORM fairly feasibly. One thing I found on the subject — [this repository](https://github.com/mblance/django-paginator-rawqueryset). Wouldn’t advise using that. But for me, it’s proof that the case is unpopular. – Yevgeniy Kosmak Jan 07 '22 at 11:00