5

In Django I have this:

models.py

class Book(models.Model):
    isbn = models.CharField(max_length=16, db_index=True)
    title = models.CharField(max_length=255, db_index=True)
    ... other fields ...

class Author(models.Model):
    first_name = models.CharField(max_length=128, db_index=True)
    last_name = models.CharField(max_length=128, db_index=True)
    books = models.ManyToManyField(Book, blank=True)
    ... other fields ...

admin.py

class AuthorAdmin(admin.ModelAdmin):
    search_fields = ('first_name', 'last_name', 'books__isbn', 'books__title')

    ...

My problem is when I do a search from the Author admin list page with 2 or more short terms, MySQL start to take a lot of time (at least 8 sec. for a 3 terms query). I have around 5000 Authors and 2500 Books. The short here is very important. If I search for 'a b c', so 3 really short terms, I'm not enough patient to wait for the result (I waited at least 2 min.). Instead if I search for 'all bee clue' I got the result in 2 sec. So the problem look the be really with short terms on related fields.

The SQL query resulting from this search have a lot of JOIN, LIKE, AND and OR but no subquery.

I'm using MySQL 5.1 but I tried with 5.5 with no more success.

I also tried to increase the innodb_buffer_pool_size to a really large value. That change nothing.

The only idea I have right now to improve the performance is to denormalize to isbn and title field (ie copy them directly into Authors) but I will have to add a bunch of mechanics to keep these fields in sync with the real ones in Book.

Any suggestions on how to improve this query?

Etienne
  • 12,440
  • 5
  • 44
  • 50

2 Answers2

10

After a lot of investigations I found that the problem come from how the search query is built for the admin search field (in the ChangeList class). In a multi-terms search (words separated by space) each term is added to the QuerySet by chaining a new filter(). When there's one or more related fields in the search_fields, the created SQL query will have a lot of JOIN chained one after the other with many JOIN for each related field (see my related question for some examples and more info). This chain of JOIN is there so that each term will be search only in the subset of data filter by the precedent term AND, most important, that a related field need to only have one term (vs needing to have ALL terms) to make a match. See Spanning multi-valued relationships in the Django docs for more info on this subject. I'm pretty sure it's the behavior wanted most of the time for the admin search field.

The drawback of this query (with related fields involved) is that the variation in performance (time to perform the query) can be really large. It depends on a lot of factors: number of searched terms, terms searched, kind of field search (VARCHAR, etc.), number of field search, data in the tables, size of the tables, etc. With the right combination it's easy to have a query that will take mostly forever (a query that take more then 10 min. for me is a query that take forever in the context of this search field).

The reason why it can take so long is that the database need to create a temporary table for each term and scan it mostly entirely to search for the next term. So, this adds up really quickly.

A possible change to do to improve the performance is to ANDed all terms in the same filter(). This way their will be only one JOIN by related field (or 2 if it's a many to many) instead of many more. This query will be a lot faster and with really small performance variation. The drawback is that related fields will have to have ALL the terms to match, so, you can get less matches in many cases.

UPDATE

As asked by trinchet here’s what’s needed to do the change of search behavior (for Django 1.7). You need to override the get_search_results() of the admin classes where you want this kind of search. You need to copy all the method code from the base class (ModelAdmin) to your own class. Then you need to change those lines:

for bit in search_term.split():
    or_queries = [models.Q(**{orm_lookup: bit})
                  for orm_lookup in orm_lookups]
    queryset = queryset.filter(reduce(operator.or_, or_queries))

To that:

and_queries = []
for bit in search_term.split():
    or_queries = [models.Q(**{orm_lookup: bit})
                  for orm_lookup in orm_lookups]
    and_queries.append(Q(reduce(operator.or_, or_queries)))
queryset = queryset.filter(reduce(operator.and_, and_queries))

This code is not tested. My original code was for Django 1.4 and I just adapt it for 1.7 here.

Community
  • 1
  • 1
Etienne
  • 12,440
  • 5
  • 44
  • 50
  • 1
    Hi @Etienne, could it possible to post here how exactly you solve this?, i meant, how you introduced all the terms in the same filter? Thanks! – trinchet Sep 30 '14 at 12:54
  • Thanks @Etienne, I'm very grateful for. Here another good solution for Django < 1.6 http://stackoverflow.com/questions/14426692/django-admin-search-with-multiple-words/14490862#14490862 – trinchet Oct 02 '14 at 21:22
  • @trinchet This other solution did not do the same thing. It forces Django to make the search on the whole string instead of searching for each term (with OR between them). It can definitely speed up the query, but by doing another kind of compromise. So, it depends of each use case to choose the best compromise. – Etienne Oct 02 '14 at 21:37
  • you are right, this doesnt get the same result, i just posted as another solution to overwrite the default django search query. Thanks for point on. – trinchet Oct 03 '14 at 14:54
1

You can redefine get_changelist for ModelAdmin subclass and try to optimize query manually there. For example, ISBN can be looked up with exact match instead of icontains, and you can add subqueries on Book to work faster.

ilvar
  • 5,718
  • 1
  • 20
  • 17