Is there any way to load an index created through VectorstoreIndexCreator in langchain? How does it work?

Question

I am experimenting with langchains and its applications, but as a newbie, I could not understand how the embeddings and indexing really work together here. I know what these two are, but I can't figure out a way to use the index that I created and saved using persist_directory.

I succesfully saved the object created by VectorstoreIndexCreator using the following code:

index = VectorstoreIndexCreator(vectorstore_kwargs={"persist_directory":"./custom_save_dir_path"}).from_loaders([loader])

but I cannot find a way to use the .pkl files created. How can I use these files in my chain to retrieve data?

Also, how does the billing in openAI work? If I cannot use any saved embeddings or index, will it re-embed all the data every time I run the code? As a beginner, I am still learning my way around and any assistance would be greatly appreciated.

Here is the full code:

from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
import os
os.environ["OPENAI_API_KEY"] = "sk-xxx"
# Load the documents
loader = CSVLoader(file_path='data/data.csv')

#creates an object with vectorstoreindexcreator
index = VectorstoreIndexCreator(vectorstore_kwargs={"persist_directory":"./custom_save_dir_path"}).from_loaders([loader])

# Create a question-answering chain using the index
chain = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=index.vectorstore.as_retriever(), input_key="question")

# Pass a query to the chain
while True:
    query = input("query: ")
    response = chain({"question": query})
    print(response['result'])

score 1 · Accepted Answer · answered May 06 '23 at 15:09

1

By default VectorstoreIndexCreator use the vector database DuckDB which is transient a keeps data in memory. If you want to persist data you have to use Chromadb and you need explicitly persist the data and load it when needed (for example load data when the db exists otherwise persist it).

for more details about chromadb see: chroma

The llm used in your case openai is the one responsible for the creation of embbedings (i.e the vectors that will be stored in the vector database). So whenever you process you data and store it in the vector store you will incure charges in openai, if you load vector store from the db you want incurr charge from openai

answered May 06 '23 at 15:09

chekkal

187
6

1

hey man did you figure it out? I'm also a newbie and what i'm trying to do is basically to have the LLM with memory and access to my own data in a way different than a prompt, so i also need to connect together chain object and a vector – pawelek69420 Jun 25 '23 at 15:22
Yes! you can use 'persist directory' to save the vector store. After splitting you documents and defining the embeddings you want to use, you can use following example to save your index _from langchain.vectorstores import Chroma_ _persist_directory = [The directory you want to save in]_ _docsearch = Chroma.from_documents(documents=docs, embedding=embeddings, persist_directory=persist_directory)_ Then later on you can load this using _vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)_ and use the vectordb for retrieving Hope it helps! – AqashaT Jul 27 '23 at 08:53

Is there any way to load an index created through VectorstoreIndexCreator in langchain? How does it work?

1 Answers1

Linked