0

I'm trying to use pgvector to find similar products by name, I've built the embedding on the name, now, when I try to use the method that is stated on the docs for pgvector in order to get the products with a certain distance using ALIAS the query never executed, I tried the other method using group_by which worked the get top 10 for example, but in some cases, top 10 might be too far from being similar to the product,

That's why I want to order_by then filter based on the distance only for the 100 in order to ensure that I'm getting similar items,

The problem with Django is I can't filter after slicing the queryset, I have a workaround which is:

Getting the top 100 ordered by distance and then using their id to filter for the next query which will cause more executing time. (More than 100K records)

Is there a better way of doing it, thanks.

Slice then filter:

result = self.get_queryset().order_by(L2Distance('name_embedding', entry_vector))[:100].alias(distance=L2Distance('name_embedding', entry_vector)).filter(distance__lt=threshold)

My Model:

from pgvector.django import VectorField

class Product(BaseModel):

    field="slug"
    
    in_menu = True
    app_name = app_name + "s"
    menu_parent = "products"

    name = models.CharField(_("Name"), max_length=200)
    
    key = models.CharField(_("key"), max_length=200, unique=True, default="")
    name_embedding  = VectorField(dimensions=1536, null=True)
  • I don't think after slicing you get a Queryset but a list. – Ahtisham Aug 16 '23 at 02:16
  • There is a whole section in documentation for text search have taken a look at it ? Here https://docs.djangoproject.com/en/4.2/ref/contrib/postgres/search/#searchrank – Ahtisham Aug 16 '23 at 02:27
  • That's the issue, 'cause i can't create the annotation for all records, the query takes enormous time iterating all of them, that's why I need to get only the top relevant and then use annotate on them, I tried text search of postges like bellow, the first don't find relevant even their existence, the second never completed >> ` return self.get_queryset().annotate(search=vector).filter(search=query) ||| return Model.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank') ` – El hosayn Ait Ali Aug 16 '23 at 10:22
  • what is `L2Distance` ? can you also update question with model – Ahtisham Aug 16 '23 at 10:26
  • The title field (name) is converted to a vector using openai, then the pgvector uses one of methods in order to calculate the distance between vectors (don't know deep details) one of the methods is L2Distance (Euclidean distance) which gives me how much vectors are far from others. – El hosayn Ait Ali Aug 16 '23 at 10:33

0 Answers0