How can I use FAISS ( Facebook AI Similarity Search ) to compare cosine similarity with texts with list of target texts, and return the max cosine similarity and target text from list which is most similar:
I have done this sofar:
import faiss
# Preprocess data as needed
documents = [
"This is the first document"
]
documents2 = [
"first doc",
"This is the first document"
]
# Use TF-IDF to convert the text documents into numerical vectors
vectorizer = TfidfVectorizer()
data = vectorizer.fit_transform(documents)
data = data.toarray()
# Normalize the vectors
data = data / np.linalg.norm(data, axis=1, keepdims=True)
# Create an index using FAISS
index = faiss.IndexFlatIP(data.shape[1]) # Create an index with the same number of dimensions as your data
index.add(data) # Add your data to the index
# Search for the nearest neighbors of a given text document
query = vectorizer.transform(documents2).toarray() # The text document you want to find the nearest neighbors of
k = 2 # The number of nearest neighbors to return
distances, indices = index.search(query, k)
# Calculate the similarity score between the text documents
similarity_score = distances[0][0] # The inner product is equal to the cosine of the angle between the normalized vectors
However, this is not giving result which I am looking for. Can someone guide me please?