0

I just followed the example in the langchain documentation to create a basic QA chatbot.

It works fine, but after a enough questions, chat history seem to become too big for the prompt and I get this error :

This model's maximum context length is 4097 tokens, however you requested 4107 tokens (4054 in your prompt; 73 for the completion). Please reduce your prompt; or completion length.

Here's the code :

    with open("./reponse_a_la_vie.txt") as f:
    reponse_a_la_vie = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(reponse_a_la_vie)

embeddings = OpenAIEmbeddings()

docsearch = Chroma.from_texts(
    texts, embeddings, metadatas=[{"source": i} for i in range(len(texts))]
)



Template
template = """You are a chatbot having a conversation with a human.

Given the following extracted parts of a long document and a question, create a final answer.

{context}

{chat_history}
Human: {human_input}
Chatbot:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "human_input", "context"], template=template
)
memory = ConversationBufferMemory(memory_key="chat_history", input_key="human_input")
chain = load_qa_chain(
    OpenAI(temperature=1, max_tokens=1000), chain_type="stuff", memory=memory, prompt=prompt
)

def generate_response(query):
    docs = docsearch.similarity_search(query)
    return chain({"input_documents": docs, "human_input": query}, return_only_outputs=True)['output_text']

How could I avoid that without altering the memory?

prime
  • 25
  • 4

0 Answers0