0

I have successfully clustered a bunch of vectors using the faiss kmeans. But now I am not able to store the model and load it later for inference.

clustering = faiss.Kmeans(candles.shape[1], k=clusters, niter=epochs, gpu=gpu, verbose=True)
clustering.train(X)
cluster_index = clustering.index

# failed with "don't know how to serialize this type of index"
faiss.write_index(cluster_index, f"{out_file}.faiss")
model2 = faiss.read_index(f"{out_file}.faiss")

model2.search(x, 1)
KIC
  • 5,887
  • 7
  • 58
  • 98

1 Answers1

0

Looks like you can store the centeroids as a numpy/pandas and then re-create the index:

pd.DataFrame(model.centroids).parquet(filename)
df_centroids = pd.read_parquet(filename)
cluster_index = faiss.IndexFlatL2(df_centroids.shape[1])
cluster_index.add(df_centroids.values.copy(order='C'))

However, this feels a little bit wrong since you can not know what distance measure was used for training.

KIC
  • 5,887
  • 7
  • 58
  • 98