0

For the past few weeks I have been working at a QA retrieval chatbot project with LangChain and OpenAI in Python. I have an ingest pipepline set up in a notebook on Google Colab, with which I have been extracting text from PDFs, creating embeddings and storing into FAISS vectorstores, that I would then use to test my LangChain chatbot (a Streamlit python app). I have a bunch of vectorstores (one per PDF) that I have created in the past few days.

The Google Colab pipeline simply takes the extracted PDF pages, creates LangChain documents, and finally embeds them and saves the vectorstore with the follwing code

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)
with open("file.pkl", "wb") as f:
    pickle.dump(vectorstore, f)

I would then manually download the FAISS vectorstore file.pkl, store it on my local machine in a db folder that my Streamlit app can access as follows:

if os.path.exists(f"db/{filename}.pkl"):
   with open(f"db/{filename}.pkl", "rb") as f:
       vectorstore = pickle.load(f)

Since today (Monday 3 July) any new FAISS vectorstore that I create with my Google Colab notebook would not be loaded in my app. I would get an exception error saying that the variable "vectorstore" was not defined.

I thought I'd try downloading the notebook and creating the vectorstore locally, but the result was the same.

Alas, I had not been paying attention to what versions of lanchain and openai were being installed every time I'd run my Colab notebook. Fearing that it might be due to some update, I made sure both my Google Colab notebook and my local environment are the same:

langchain==0.0.205
openai==0.27.8
streamlit==1.22.0
faiss-cpu==1.7.4
tiktoken==0.4.0

Now the vectorstore gets loaded in the app, but I get the following error:

AttributeError: 'OpenAIEmbeddings' object has no attribute 'deployment'

If I create the vectorstore from the same notebook on my local machine, I get the following error:

AttributeError: 'OpenAIEmbeddings' object has no attribute 'headers'

Updating to the latest versions of langchain and openai does not help. I tried downgrading the langchain version, but eventually I reach one that no longer supports gpt-3.5-turbo-16k (the model used in my app) and I get a different kind of error when running my app.

Nothing else has changed, my app launches fine, the vectorstores I had created in the past few days work fine. Just any new vectorstores that I create no longer work.

What could have happened?

Bert
  • 1
  • 1

1 Answers1

1

I found this resource: https://dagster.io/blog/training-llms

In order to generate the VectorStore and save it as a pkl file, they run the following:

from langchain.vectorstores.faiss import FAISS
from langchain.embeddings import OpenAIEmbeddings
import pickle

@asset
def vectorstore(documents):
    vectorstore_contents = FAISS.from_documents(documents, OpenAIEmbeddings())
    with open("vectorstore.pkl", "wb") as f:
        pickle.dump(vectorstore_contents, f)

Subsequently, (having saved the pkl file locally) they read their pkl file as a Langchain VectorStore object. I've tried this and it loaded the pkl object as a VectorStore object with all of its attributes.

from langchain.vectorstores import VectorStore
import pickle

vectorstore_file = "vectorstore.pkl"

with open(vectorstore_file, "rb") as f:
    global vectorstore
    local_vectorstore: VectorStore = pickle.load(f)

Hope this helps!

  • This does not really answer the question. If you have a different question, you can ask it by clicking [Ask Question](https://stackoverflow.com/questions/ask). To get notified when this question gets new answers, you can [follow this question](https://meta.stackexchange.com/q/345661). Once you have enough [reputation](https://stackoverflow.com/help/whats-reputation), you can also [add a bounty](https://stackoverflow.com/help/privileges/set-bounties) to draw more attention to this question. - [From Review](/review/late-answers/34889857) – Gugu72 Aug 25 '23 at 21:43
  • @Gugu72 Hello, can you explain how I should modify my answer to make it acceptable? I tried the solution I posted above and it is the correct solution to the question. – hopeonthestreet Aug 27 '23 at 22:15
  • Hi, you can remove the first sentence that looks like a question, and elaborate a little more on the link. In the future, this link might not be accessible anymore, so at least an extract (important info, not the whole resource) should be present in your answer to make it more relevant. :) – Gugu72 Aug 28 '23 at 09:49
  • @Gugu72 Absolutely! That's great feedback. If possible, please let me know if my edited answer is more acceptable. Thank you! – hopeonthestreet Aug 29 '23 at 17:19
  • That is an amazing edit, giving important info while crediting the resource you found, upvoted :) – Gugu72 Aug 29 '23 at 17:20