0

I have worked in creating a vectorstore from a series of paragraphs from a text document. The text of the document has been splitted in non-overlapping paragraphs for a good reason, as these represents different informations. These paragraphs have metadata that has been included

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document
import time


paragraphs_document_list = []

for paragraph in paragraph_list:
    documents_list.append(Document(page_content=paragraph,
metadata=dict(paragraph_id=paragraph_id,
                       page=pageno))


db = FAISS.from_documents(documents = paragraphs_document_list,
                          embedding = OpenAIEmbeddings(model="gpt-4")
                         )

Normally, I could query the general content of my document asking a question about its whole.

  qa_chain = RetrievalQA.from_chain_type(
                llm=ChatOpenAI(temperature = 0.0, model='gpt-4'), 
                chain_type="stuff", 
                retriever=db_test.as_retriever(), 
                verbose=False
                )

    label_output = qa_chain.run(query="What is this document about?")

However, I would like to, instead, retrieve the different embeddings in my FAISS vectorstore and then query those individually, using as query something like "What's this paragraph about?".

Is there any option to query a specific embedding or to use as retriever a single specific embedding? In any case, I would like to gain access of the original paragraph I'm querying together with its metadata.

I tried filtering using metadata to answer based on a specific paragraph:

filter_dict = {"paragraph_id":19, "page":5}

results = db.similarity_search(query, filter=filter_dict, k=1, fetch_k=1)

0 Answers0