0

I am using langchain for building softwares for pdf document reading and question-answering. While working on it, I have to build vector embeddings for the texts in the document. But while working on it, I have come across multiple platforms such as MongoDB, Pinecone, etc. and libraries such as FAISS that considered to be the best for similarity search among the vector embeddings. On further research, I came across that langchain.vectorstores.FAISS stores the vector embeddings on cloud memory or RAM (I am not sure which one) and cannot be used later once the code block is terminated and Pinecone is built upon FAISS algorithm. But due to costing in Pinecone, I am thinking not to move forward with it, instead go forward with MongoDB. LangChain has got a function, langchain.vectorstores.MongoDBAtlasVectorSearch which saves the vector embeddings in MongoDB platform. I wanted to know is MongoDBAtlasVectorSearch built upon FAISS. Also, any other recommendations for saving vector embedding platforms for longer period of time with multiple index values.

1 Answers1

0

(MongoDB Employee here)

MongoDB's Atlas Vector Search is not built using FAISS, but it does utilize a HNSW Graph to provide fast and efficient Vector Search over your MongoDB Collection data.

baf509
  • 11
  • 1
  • So does it mean, in LangChain, I can use `langchain.vectorstores.MongoDBAtlasVectorSearch` over `langchain.vectorstores.FAISS` as FAISS does not store the data for longer period of time whereas I can store vector embeddings on MongoDB and use the embeddings later whenever I need them. – Shuhul Handoo Jul 31 '23 at 05:58
  • Yes that is accurate. When using Vector Search on MongoDB Atlas your indexes are persisted and kept up to date in the database. – baf509 Aug 01 '23 at 12:11
  • Is there any way that I store my vector embeddings in MongoDB database and when it's time to retrieve those vectors, I can use FAISS algorithm for similarity search retrieval? – Shuhul Handoo Aug 01 '23 at 12:55
  • You could store your Vectors in MongoDB and then read them and build an index with FAISS to query it, but that would be just using the database for storage. If you use Atlas Vector Search then you can have MongoDB Atlas build an index that will let you do an approximate nearest neighbor search over your data whenever you want without needing to rebuild the index with FAISS. – baf509 Aug 05 '23 at 20:21
  • Do you have any idea of how that can be done? Any leads? – Shuhul Handoo Aug 05 '23 at 20:53