2

Im using Django 2.2 and PostgreSQL 12.

Here is my model:

from django.contrib.postgres.search import SearchVectorField, SearchVector
from django.contrib.postgres.fields import JSONField

class ProfileUser(models.Model):
    name = JSONField()

    search_vector = SearchVectorField(null=True)

    class Meta:
        indexes = [
            GinIndex(fields=['search_vector'], name='user_full_name_gin_idx')
        ]

    def save(self, *args, **kwargs):
        super(ProfileUser, self).save(*args, **kwargs)
        ProfileUser.objects.update(search_vector=SearchVector('name'))

Here Im creating a new user and trying to find it:

from apps.profiles.models import ProfileUser
from django.contrib.postgres.search import SearchVector

ProfileUser.objects.create(name=[{'name': 'SomeUser', 'lang': 'en'}])
ProfileUser.objects.annotate(search=SearchVector('name')).filter(search__icontains='someuser').explain()

Result:

"Seq Scan on profiles_user (cost=0.00..81.75 rows=1 width=316)\n Filter: (upper((to_tsvector(COALESCE((name)::text, ''::text)))::text) ~~ '%someuser%'::text)"

How to make indexing working?

EDIT: As a response to @ivissani's comment, I added 5000 users and tried .filter(search__icontains='someuser') and .filter(search_vector__icontains='someuser') - same story -> Seq Scan

TitanFighter
  • 4,582
  • 3
  • 45
  • 73
  • I think you are searching over the JSONField instead of using `search_vector` because you are doing `annotate(search=SearchVector('name')).filter(search__icontains='someuser')` and I think you should be doing `.filter(search_vector__icontains='someuser')` instead – ivissani Dec 21 '19 at 19:59
  • Anyway, the database will decided whether using the index is better. If you have only one entry in your table is likely that a full scan is more efficient that an index scan – ivissani Dec 21 '19 at 20:05
  • @ivissani, hope 5k users enough :) Nothing has changed. – TitanFighter Dec 21 '19 at 20:23
  • 5,000 rows is still a fairly tiny amount of data, in relative terms. Two things I try in situations like these: load up several millions rows of data, and grab the raw SQL that Django is running, with something like debug toolbar. Then take the SQL directly into `psql` or PG Admin and tweak to see if you can get it to hit an index. That'll offer provide a clue. – FlipperPA Dec 21 '19 at 23:12

2 Answers2

6

I think you were not using completely well the full-text search Django module. The main issue I can see in your code are:

  • updating the search vector field without filtering your object
  • executing your search query on an annotated SearchVector using an icontains instead of using the SearchVectorField with your GinIndex

I updated a bit your models code:

from django.contrib.postgres.fields import JSONField
from django.contrib.postgres.indexes import GinIndex
from django.contrib.postgres.search import SearchVectorField, SearchVector
from django.db import models
from django.db.models import F


class ProfileUser(models.Model):
    name = JSONField()
    search_vector = SearchVectorField(null=True)

    class Meta:
        indexes = [GinIndex(fields=["search_vector"], name="user_full_name_gin_idx")]

    def save(self, *args, **kwargs):
        super().save(*args, **kwargs)
        ProfileUser.objects.annotate(search_vector_name=SearchVector("name")).filter(
            id=self.id
        ).update(search_vector=F("search_vector_name"))

As you can see I added an annotate and a filter in save method to update only the search vector fields of your model (You can find in another answer of mine another example of this usage)

Here you can see the code I used in the python shell to create a new ProfileUser. You can see the two SQL query executed in the save method:

>>> from users.models import ProfileUser
>>> ProfileUser.objects.create(name=[{'name': 'SomeUser', 'lang': 'en'}])

INSERT INTO "users_profileuser" ("name", "search_vector")
VALUES ('[{"name": "SomeUser", "lang": "en"}]', NULL) RETURNING "users_profileuser"."id"

UPDATE "users_profileuser"
SET "search_vector" = to_tsvector(COALESCE(("users_profileuser"."name")::text, ''))
WHERE "users_profileuser"."id" = 1

And below there the code I executed in the python shell to search the ProfileUser with the SearchVectorField using the GINindex of the model. You can see the Index Scan on the index:

>>> from django.contrib.postgres.search import SearchQuery
>>> ProfileUser.objects.filter(search_vector=SearchQuery('someuser')).explain()

EXPLAIN
SELECT "users_profileuser"."id",
    "users_profileuser"."name",
    "users_profileuser"."search_vector"
FROM "users_profileuser"
WHERE "users_profileuser"."search_vector" @@ (plainto_tsquery('someuser')) = true

"Bitmap Heap Scan on users_profileuser  (cost=12.28..21.74 rows=4 width=68)
    Recheck Cond: (search_vector @@ plainto_tsquery('someuser'::text))
    ->  Bitmap Index Scan on user_full_name_gin_idx  (cost=0.00..12.28 rows=4 width=0)
            Index Cond: (search_vector @@ plainto_tsquery('someuser'::text))"

If you want to know more about Full-text Search with Django and PostgreSQL you can read the official documentation about the full-text search.

If you are interested in external article about that here it's the one I wrote: Full-Text Search in Django with PostgreSQL

Paolo Melchiorre
  • 5,716
  • 1
  • 33
  • 52
0

Based on this article I found short solution for Django 2.2+

Model:

from django.contrib.postgres.fields import JSONField
from django.contrib.postgres.indexes import GinIndex
from django.contrib.postgres.search import SearchVectorField, SearchVector
from django.db import models


class ProfileUser(models.Model):
    name = JSONField()
    search_vector = SearchVectorField(null=True)

    class Meta:
        indexes = [GinIndex(fields=["search_vector"], name="user_full_name_gin_idx")]

    def save(self, *args, **kwargs):
        super(ProfileUser, self).save(*args, **kwargs)
        ProfileUser.objects.filter(pk=self.pk).update(search_vector=SearchVector('name'))

Query:

from django.contrib.postgres.search import SearchQuery
from apps.profiles.models import ProfileUser

ProfileUser.objects.create(name=[{'name': 'Adriano Celentano', 'lang': 'en'}])

partial_name = 'celen'  # or 'celentano adr'

query = re.sub(r'[!\'()|&]', ' ', partial_name).strip()
if query:
    query = re.sub(r'\s+', ' & ', query)
    query += ':*'  # -> 'celen:*' or 'celentano & adr:*'

    # Please note, that `search_type` parameter was added to Django 2.2.
    ProfileUser.objects.filter(search_vector=SearchQuery(query, search_type='raw')).explain()

Such SearchQuery allows to search names partially (ie "starting with" approach, example: can find "celen" but can not find "lent") and case insensitively. If you need to "lent" part, probably you need to use Trigram Similarity as shown in @paolo-melchiorre 's article

"Bitmap Heap Scan on profiles_user (cost=13.03..194.69 rows=101 width=333)\n Recheck Cond: (search_vector @@ to_tsquery('celen:'::text))\n -> Bitmap Index Scan on user_full_name_gin_idx (cost=0.00..13.01 rows=101 width=0)\n
Index Cond: (search_vector @@ to_tsquery('celen:
'::text))"

P.S. Regarding icontains and contains I found in different sources, that they always do Sequential Scan.


One more possibly useful article.

TitanFighter
  • 4,582
  • 3
  • 45
  • 73