I have 100K known embedding i.e.
[emb_1, emb_2, ..., emb_100000]
Each of this embedding is derived from GPT-3 sentence embedding with dimension 2048.
My task is given an embedding(embedding_new
) find the closest 10 embedding from the above 100k
embedding.
The way I am approaching this problem is brute force.
Every time a query asks to find the closest embeddings, I compare embedding_new
with [emb_1, emb_2, ..., emb_100000]
and get the similarity score.
Then I do quicksort of the similarity score to get the top 10
closest embedding.
Alternatively, I have also thought about using Faiss.
Is there a better way to achieve this?