1

This is a sort of a design question. I am VectorDB newbie. I am working on creating a LLM enable summarisation system for a huge set of documents. These documents will have a certain date in them. Users can be searching them on these dates.

When the user is searching I am iterating thru these structure and creating a summary view thru LLM (Custom model based on GPT4All).

I have chosen FAISS with langchain. Right now I am creating persisted date-centric VectorDBs under a specific subject like below.

<Subject>
...<dt-1>
...<dt-2>

I have created by own embedding but planning to switch to Huggingface's sentence-transformer. I have created and trained a LLM based on Llama weights.

The below code is for the similarity search:

def similarity_search(query, index):
    matched_docs = index.similarity_search(query, k=5) 
    sources = []
    for doc in matched_docs:
        sources.append({
                "page_content": doc.page_content,
                "metadata": doc.metadata,
            }
        )
    return matched_docs, sources

I want to stick to langchain. Is there a way to scan multiple documents and use it with a LLM.

Tanmoy
  • 11
  • 2

0 Answers0