LangChain: Reduce size of tokens being passed to OpenAI

Question

I am using LangChain to create embeddings and then ask a question to those embeddings like so:

embeddings: OpenAIEmbeddings = OpenAIEmbeddings(disallowed_special=())
db = DeepLake(
    dataset_path=deeplake_url,
    read_only=True,
    embedding_function=embeddings,
)
retriever: VectorStoreRetriever = db.as_retriever()
model = ChatOpenAI(model_name="gpt-3.5-turbo") 
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)
result = qa({"question": question, "chat_history": chat_history})

But I am getting the following error:

File "/xxxxx/openai/api_requestor.py", line 763, in _interpret_response_line
    raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 13918 tokens. Please reduce the length of the messages.

The chat_history is empty and the question is quite small.

How can I reduce the size of tokens being passed to OpenAI?

I'm assuming the response from the embeddings is too large being passed to openai. It might be easy enough to just figure out how to truncate the data being sent to openai.

This implies that either the documents you want to query or the chat history is too long. Can you show how long the chat history is, and use `return_source_documents` to show how long those are? https://python.langchain.com/en/latest/modules/chains/index_examples/chat_vector_db.html#return-source-documents — Nick ODell, Jun 11 '23 at 18:52
chat history is empty, and the question is quite short. I'll update the question. I'm assuming the embeddings it is passing to openai are too big, but not sure how to see them and manually truncate them. — Patrick Collins, Jun 11 '23 at 19:02
Langchain is not passing embeddings to your language model. It is passing the documents associated with each embedding, which are text. Although the embeddings are a fixed size, the documents could potentially be any size, depending on how you split your documents. (Or if you split them at all.) — Nick ODell, Jun 11 '23 at 19:12
Right, the text associated with the embeddings that we are passing is too big. It probably would be easiest if I could find out where those are being passed — Patrick Collins, Jun 11 '23 at 19:20

score 3 · Accepted Answer · answered Jun 11 '23 at 19:33

Summary

When you initiate the ConversationalRetrievalChain object, pass in a max_tokens_limit amount.

qa = ConversationalRetrievalChain.from_llm(
        model, retriever=retriever, max_tokens_limit=4000
    )

This will automatically truncate the tokens when asking openai / your llm.

Longer explainer

In the base.py of ConversationalRetrievalChain there is a function that is called when asking your question to deeplake/openai:

    def _get_docs(self, question: str, inputs: Dict[str, Any]) -> List[Document]:
        docs = self.retriever.get_relevant_documents(question)
        return self._reduce_tokens_below_limit(docs)

Which reads from the deeplake vector database, and adds that as context to your doc's text that you upload to openai.

The _reduce_tokens_below_limit reads from the class instance variable max_tokens_limit to truncate the size of the input docs.

LangChain: Reduce size of tokens being passed to OpenAI

1 Answers1

Summary

Longer explainer