0

index = faiss.IndexFlatL2(vectormatrix.shape[1])

print(index.is_trained)

faiss.normalize_L2(vectormatrix)

index.add(vectormatrix)

print(index.ntotal)

Distance, Index = index.Search(token_vector.reshape((1,token_vector.size)), k)

1 Answers1

0

I have almost the same issue, but with inner product. Distance should be in range (-1; 1), but I have values like 100 or 200.

%%time
k = 255

dim = X.shape[1]
quantiser = faiss.IndexFlatIP(dim)
index = faiss.IndexIVFFlat(quantiser, dim, k)

faiss.normalize_L2(X)

index.train(X)
index.add(X)

sample = ['some text']
query = scipy.sparse.csr_matrix.toarray(vectorizer.transform(sample))
index.nprobe=100
D, I = index.search(query, 10)
print(D[0])

> array([73.49516 , 73.504524, 73.75489 , 73.767204, 73.78795 ,
> 73.800064, 73.80722 , 73.82175 , 73.94714 , 74.034   ], dtype=float32)

Im trying to solve this now

adding as an argument faiss.METRIC_INNER_PRODUCT to faiss.IndexIVFFlat() partially solved my problem

UPDATE:

add

faiss.normalize_L2(query)

after

query = scipy.sparse.csr_matrix.toarray(vectorizer.transform(sample))

After these changes you will get the correct distance value

Fedor
  • 19
  • 4