0

Here is the simplified problem, I have a Book model:

class Book(models.Model):
     language = models.CharField(max_length=2, choices=LANGUAGE_CHOICES)
     ...

I am using django-hitcount to count views to my Books (probably no one knows it because it is an old project), anyway let me roughly summarize it: It creates a HitCount object with an hits counter and a GenericForeignKey to the respective object.

I would like to get the 15 books of a certain language with more hits, obviously ordered by hits.

I already looked at this question and it helped me to figure it out my (partial) solution, divided into 3 queries:

  1. Get all the Books' IDs of a certain language;
  2. Get the first 15 HitCounts related_object's IDs which are also in the first list;
  3. Get the Books with the IDs taken in step 2;

Translated into code:

content_type = ContentType.objects.get_for_model(Book)
books = tuple(Books.objects.filter(
        language=language).values_list('id', flat=True))

all_time = list(HitCount.objects.filter(
        content_type=content_type,
        object_pk__in=books).values_list(
        'object_pk', 'hits').order_by('-hits')[:15])

all_time_ids = [i[0] for i in all_time]

best_of_all_time = Books.objects.select_related(
        'manga').filter(pk__in=all_time_ids)

This approach presents two problems:

  1. With a big database (like mine) the second query becomes really expensive;
  2. With the last query i lose the order by hits from the second query;

Anyone have a suggestion?

Community
  • 1
  • 1
Jiloc
  • 3,338
  • 3
  • 24
  • 38

1 Answers1

1
  1. Instead of a list of ids pass a queryset to object_pk__in condition. Django is smart enough to translate this into SQL subquery so all expenses will be handled by SQL server which is smart too :-)

  2. Use queryset's in_bulk() method to get an easy accessible dict of Books.

So code will look something like this:

# just queryset instead of tuple of ids
books = Books.objects.filter(language=language).values_list('id', flat=True)
rating = list(HitCount.objects.filter(content_type=content_type,
                                      object_pk__in=books)
                              .values_list('object_pk', 'hits')
                              .order_by('-hits')[:15])

book_ids = [r[0] for r in rating]

# dict of Books with book.pk as a key
books_d = Books.objects.select_related('manga').in_bulk(book_ids)

# list of tuples (book, hits) ordered by -hits
best_of_all_time = [books_d[pk], hits for pk, hits in rating]
catavaran
  • 44,703
  • 8
  • 98
  • 85
  • Could you explain to me why is using in_bulk better than using filter? – Jiloc Dec 30 '14 at 16:49
  • `in_bulk()` is not better than `filter()` :-) It just allows you construct **ordered** list of books in one line (last line in my example). It is not about performance - SQL query for both `in_bulk` and `filter` version will be the same. – catavaran Dec 30 '14 at 23:19
  • The real performance gain is the first two lines of code in example. Using SQL subquery instead of loading list of ids into memory is a huge win in terms of speed. – catavaran Dec 30 '14 at 23:23