I'm trying to use pgvector to find similar products by name, I've built the embedding on the name, now, when I try to use the method that is stated on the docs for pgvector in order to get the products with a certain distance using ALIAS the query never executed, I tried the other method using group_by which worked the get top 10 for example, but in some cases, top 10 might be too far from being similar to the product,
That's why I want to order_by then filter based on the distance only for the 100 in order to ensure that I'm getting similar items,
The problem with Django is I can't filter after slicing the queryset, I have a workaround which is:
Getting the top 100 ordered by distance and then using their id to filter for the next query which will cause more executing time. (More than 100K records)
Is there a better way of doing it, thanks.
Slice then filter:
result = self.get_queryset().order_by(L2Distance('name_embedding', entry_vector))[:100].alias(distance=L2Distance('name_embedding', entry_vector)).filter(distance__lt=threshold)
My Model:
from pgvector.django import VectorField
class Product(BaseModel):
field="slug"
in_menu = True
app_name = app_name + "s"
menu_parent = "products"
name = models.CharField(_("Name"), max_length=200)
key = models.CharField(_("key"), max_length=200, unique=True, default="")
name_embedding = VectorField(dimensions=1536, null=True)