Query filter based on GenericForeignKey's fields

Question

Here is the simplified problem, I have a Book model:

class Book(models.Model):
     language = models.CharField(max_length=2, choices=LANGUAGE_CHOICES)
     ...

I am using django-hitcount to count views to my Books (probably no one knows it because it is an old project), anyway let me roughly summarize it: It creates a HitCount object with an hits counter and a GenericForeignKey to the respective object.

I would like to get the 15 books of a certain language with more hits, obviously ordered by hits.

I already looked at this question and it helped me to figure it out my (partial) solution, divided into 3 queries:

Get all the Books' IDs of a certain language;
Get the first 15 HitCounts related_object's IDs which are also in the first list;
Get the Books with the IDs taken in step 2;

Translated into code:

content_type = ContentType.objects.get_for_model(Book)
books = tuple(Books.objects.filter(
        language=language).values_list('id', flat=True))

all_time = list(HitCount.objects.filter(
        content_type=content_type,
        object_pk__in=books).values_list(
        'object_pk', 'hits').order_by('-hits')[:15])

all_time_ids = [i[0] for i in all_time]

best_of_all_time = Books.objects.select_related(
        'manga').filter(pk__in=all_time_ids)

This approach presents two problems:

With a big database (like mine) the second query becomes really expensive;
With the last query i lose the order by hits from the second query;

Anyone have a suggestion?

catavaran · Answer 1 · 2014-12-30T08:49:48.023

1

Instead of a list of ids pass a queryset to object_pk__in condition. Django is smart enough to translate this into SQL subquery so all expenses will be handled by SQL server which is smart too :-)
Use queryset's in_bulk() method to get an easy accessible dict of Books.

So code will look something like this:

# just queryset instead of tuple of ids
books = Books.objects.filter(language=language).values_list('id', flat=True)
rating = list(HitCount.objects.filter(content_type=content_type,
                                      object_pk__in=books)
                              .values_list('object_pk', 'hits')
                              .order_by('-hits')[:15])

book_ids = [r[0] for r in rating]

# dict of Books with book.pk as a key
books_d = Books.objects.select_related('manga').in_bulk(book_ids)

# list of tuples (book, hits) ordered by -hits
best_of_all_time = [books_d[pk], hits for pk, hits in rating]

edited Dec 30 '14 at 08:49

answered Dec 30 '14 at 08:37

catavaran

44,703
8
98
85

Could you explain to me why is using in_bulk better than using filter? – Jiloc Dec 30 '14 at 16:49
`in_bulk()` is not better than `filter()` :-) It just allows you construct **ordered** list of books in one line (last line in my example). It is not about performance - SQL query for both `in_bulk` and `filter` version will be the same. – catavaran Dec 30 '14 at 23:19
The real performance gain is the first two lines of code in example. Using SQL subquery instead of loading list of ids into memory is a huge win in terms of speed. – catavaran Dec 30 '14 at 23:23

Query filter based on GenericForeignKey's fields

1 Answers1