I just followed the example in the langchain documentation to create a basic QA chatbot.
It works fine, but after a enough questions, chat history seem to become too big for the prompt and I get this error :
This model's maximum context length is 4097 tokens, however you requested 4107 tokens (4054 in your prompt; 73 for the completion). Please reduce your prompt; or completion length.
Here's the code :
with open("./reponse_a_la_vie.txt") as f:
reponse_a_la_vie = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(reponse_a_la_vie)
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_texts(
texts, embeddings, metadatas=[{"source": i} for i in range(len(texts))]
)
Template
template = """You are a chatbot having a conversation with a human.
Given the following extracted parts of a long document and a question, create a final answer.
{context}
{chat_history}
Human: {human_input}
Chatbot:"""
prompt = PromptTemplate(
input_variables=["chat_history", "human_input", "context"], template=template
)
memory = ConversationBufferMemory(memory_key="chat_history", input_key="human_input")
chain = load_qa_chain(
OpenAI(temperature=1, max_tokens=1000), chain_type="stuff", memory=memory, prompt=prompt
)
def generate_response(query):
docs = docsearch.similarity_search(query)
return chain({"input_documents": docs, "human_input": query}, return_only_outputs=True)['output_text']
How could I avoid that without altering the memory?