How to deal with Interference in Large Model-Driven Vector Databases for Textual Similarity？

Asked Aug 21 '23 at 09:18

Active Aug 21 '23 at 09:18

Viewed 12 times

I'm using a vector database, and the more I use it, the more I realize there might be an issue.

Currently, I'm using OpenAI's embedding interface to convert text into vectors and store them in the vector database. However, it seems that shorter texts are causing a lot of interference in the results.

For example:

Query: What is B of A?

Text1: A's xxxx [dozens of texts here], B is xxx.

Text2: A's c

Text3: d's B

In terms of vector similarity, the results might suggest that text2 and text3 are more similar. However, the expectation is definitely to return text1.

Could you please provide any suggestions on how to address this issue?

I am currently using Euclidean distance (L2). Should I replace it?

asked Aug 21 '23 at 09:18

lybtt

How to deal with Interference in Large Model-Driven Vector Databases for Textual Similarity？

0 Answers0