I am using LangChain to create embeddings and then ask a question to those embeddings like so:
embeddings: OpenAIEmbeddings = OpenAIEmbeddings(disallowed_special=())
db = DeepLake(
dataset_path=deeplake_url,
read_only=True,
embedding_function=embeddings,
)
retriever: VectorStoreRetriever = db.as_retriever()
model = ChatOpenAI(model_name="gpt-3.5-turbo")
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)
result = qa({"question": question, "chat_history": chat_history})
But I am getting the following error:
File "/xxxxx/openai/api_requestor.py", line 763, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 13918 tokens. Please reduce the length of the messages.
The chat_history is empty and the question is quite small.
How can I reduce the size of tokens being passed to OpenAI?
I'm assuming the response from the embeddings is too large being passed to openai. It might be easy enough to just figure out how to truncate the data being sent to openai.