Improve performance of Django PostgreSQL query

Question

Suppose I have the following table:

class Word(Model):
    id = UUIDField(primary_key=True,
                   default=uuid.uuid4,
                   editable=False)
    name = CharField(_('Word'),
                     db_index=True,
                     max_length=250)

This table contains around 3 million rows. Content is mainly Cyrillian words.

What I want to accomplish is to search through some predefined list of values. The problem is that words in the table are accented.

Content of tables is like:

мо́рква
буря́к
я́блуко

But I query without accent.

words = ['морква', 'буряк', 'яблуко']

Word.objects.annotated(name_without_accent=Func(
            F('name'),Value('[%s]' % ACCENTS), Value(''),
            function='regexp_replace',
        )).filter(reduce(operator.or_, [Q(name_without_accent__icontains=w) for w in words], Q()))

The problem is that this query on such big table runs very long. An words list can contain more elements (around 20-30).

Is there anyway to improve overall performance of querying such big table? Improve Django query? Tune Postgres? Use some external tools?

score 2 · Accepted Answer · answered Jun 20 '17 at 18:50

2

Try to use unicodedata.normalize('NFD', word) instead of regexp. And if you do a lot of such queries it would be reasonable to make additional field with unAccent versions of words and make index by that field, because in your situation db_index doesn't give any help it just slows table with reindexing.

answered Jun 20 '17 at 18:50

vZ10

2,468
2
23
33

Where should I use `unicodedata.normalize('NFD', word)`? – Paul R Jun 20 '17 at 20:46
In annotation field – vZ10 Jun 21 '17 at 03:09
Forgot my previous comment about annotation ((( It was too early in the morning for answering, sorry ((( I think the best solution would be creation of additional field or even additional temporary table – vZ10 Jun 21 '17 at 07:27
Did as you told, but still `contains` is very slow. How can I improve it? Query runs 7 seconds (instead of 65 as earlier). – Paul R Jun 21 '17 at 22:14
7 seconds even after indexing that field? By, the way if you index it using Django migrations, check in db if it really crates index. Sometimes Django indexing doesn't have effect. – vZ10 Jun 22 '17 at 05:38
I use scheme above, just to test that it is working: `Word.objects.filter(reduce(operator.or_, [Q(name__contains=w) for w in words], Q()))` – Paul R Jun 22 '17 at 07:00
I didn't get, do add an additional field with unAccent words? – vZ10 Jun 22 '17 at 09:02
No. I just tested whether field comparison will be faster against regex. It is faster, but still slow. – Paul R Jun 22 '17 at 09:13
Tell me more about your use case why do you use Q objects and contains? There can be a string in the 'name' field and part of it could be 'буряк'? – vZ10 Jun 22 '17 at 09:49
Установил Gin Index https://stackoverflow.com/questions/43793987/how-to-create-gin-index-in-django-migration .Безрезультатно. – Paul R Jun 22 '17 at 12:07
да. я копировал запрос из django, потому что там несколько слов. `SELECT "word_ukword"."id", "word_ukword"."name" FROM "word_ukword" WHERE ("word_ukword"."name"::text LIKE '%морква%' OR "word_ukword"."name"::text LIKE '%моркві%' OR "word_ukword"."name"::text LIKE '%моркви%' OR "word_ukword"."name"::text LIKE '%моркво%' OR "word_ukword"."name"::text LIKE '%морквою%' OR "word_ukword"."name"::text LIKE '%моркву%')` – Paul R Jun 22 '17 at 12:40

Improve performance of Django PostgreSQL query

1 Answers1