0

I'm using a vector database, and the more I use it, the more I realize there might be an issue.

Currently, I'm using OpenAI's embedding interface to convert text into vectors and store them in the vector database. However, it seems that shorter texts are causing a lot of interference in the results.

For example:

Query: What is B of A?

Text1: A's xxxx [dozens of texts here], B is xxx.

Text2: A's c

Text3: d's B

In terms of vector similarity, the results might suggest that text2 and text3 are more similar. However, the expectation is definitely to return text1.

Could you please provide any suggestions on how to address this issue?

I am currently using Euclidean distance (L2). Should I replace it?

lybtt
  • 1

0 Answers0