This is a sort of a design question. I am VectorDB newbie. I am working on creating a LLM enable summarisation system for a huge set of documents. These documents will have a certain date in them. Users can be searching them on these dates.
When the user is searching I am iterating thru these structure and creating a summary view thru LLM (Custom model based on GPT4All).
I have chosen FAISS with langchain. Right now I am creating persisted date-centric VectorDBs under a specific subject like below.
<Subject>
...<dt-1>
...<dt-2>
I have created by own embedding but planning to switch to Huggingface's sentence-transformer. I have created and trained a LLM based on Llama weights.
The below code is for the similarity search:
def similarity_search(query, index):
matched_docs = index.similarity_search(query, k=5)
sources = []
for doc in matched_docs:
sources.append({
"page_content": doc.page_content,
"metadata": doc.metadata,
}
)
return matched_docs, sources
I want to stick to langchain. Is there a way to scan multiple documents and use it with a LLM.