when I use the following code - which summarizes long PDFs -, it works fine for the first PDF. But if I use it for a second PDF (that is, I change the file path to another PDF), it still puts out the summary for the first PDF, as if the embeddings from the first PDF/previous round get somehow stored and not deleted.
from langchain.document_loaders import PyPDFLoader # for loading the pdf
from langchain.embeddings import OpenAIEmbeddings # for creating embeddings
from langchain.vectorstores import Chroma # for the vectorization part
from langchain.chains import ChatVectorDBChain # for chatting with the pdf
from langchain.llms import OpenAI # the LLM model we'll use (CHatGPT)
import os
os.environ["OPENAI_API_KEY"] = "my_API_KEY"
pdf_path = "file_path"
loader = PyPDFLoader(pdf_path)
pages = loader.load_and_split()
print(pages[1].page_content)
embeddings = OpenAIEmbeddings()
vectordb = Chroma.from_documents(pages, embedding=embeddings,
persist_directory=".")
vectordb.persist()
pdf_qa = ChatVectorDBChain.from_llm(OpenAI(temperature=0.9, model_name="gpt-3.5-turbo"),
vectordb, return_source_documents=True)
query = "Write a summary of the text."
result = pdf_qa({"question": query, "chat_history": ""})
print(result["answer"])
This behavior holds true even when re-starting Python, or when I try a number of other PDFs. I started renaming all objects, and sometimes this helps. But right now even after renaming all objects, it still puts out the summary for the previous PDF. I am so confused about this behavior.
Any clue how I can delete the vectors from the previous round or fix this?