ERROR: The prompt size exceeds the context window size and cannot be processed

Question

I have been trying to create a document QA chatbot using GPT4ALL as the llm and hugging face's instructor-large model for embedding, I was able to create the index, but getting the following as a response, it's not really a error which I'm getting as there is no traceback but it's just showing me the following

ERROR: The prompt size exceeds the context window size and cannot be processed.ERROR: The prompt size exceeds the context window size and cannot be processed

This is a follow up question for the following question parent question (this was resolved)

from llama_index import VectorStoreIndex, SimpleDirectoryReader
from InstructorEmbedding import INSTRUCTOR
from llama_index import PromptHelper, ServiceContext
from llama_index import LangchainEmbedding
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import OpenLLM
# from langchain.chat_models.human import HumanInputChatModel
from langchain import PromptTemplate, LLMChain
from langchain.llms import GPT4All
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

documents = SimpleDirectoryReader(r'C:\Users\avish.wagde\Documents\work_avish\LLM_trials\instructor_large').load_data()

print('document loaded in memory.......') 

model_id = 'hkunlp/instructor-large'

model_path = "..\models\GPT4All-13B-snoozy.ggmlv3.q4_0.bin"

callbacks = [StreamingStdOutCallbackHandler()]

# Verbose is required to pass to the callback manager
llm = GPT4All(model = model_path, callbacks=callbacks, verbose=True)

print('llm model ready.............')

embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name = model_id))

print('embedding model ready.............')

# define prompt helper
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 0.2

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

service_context = ServiceContext.from_defaults(chunk_size= 1024, llm=llm, prompt_helper=prompt_helper, embed_model=embed_model)

print('service context set...........')

index = VectorStoreIndex.from_documents(documents, service_context= service_context)

print('indexing done................')

query_engine = index.as_query_engine()

print('query set...........')

response = query_engine.query("What is apple's finnacial situation")
print(response)

here is the screenshot of response i got.. enter image description here

I check over GitHub, many people raised this but I couldn't find anything to resolve this.. The GitHub link for the query

score 0 · Accepted Answer · answered Aug 10 '23 at 14:55

GPT4ALL seems to have a max input size of 2048(?), but you are setting the max size to 4096.

(Not totally able to confirm this size, based on random comments I found from google: https://github.com/nomic-ai/gpt4all/issues/178)

You can re-adjust your chunk_size and max_input_size to account for this

# define prompt helper
# set maximum input size
max_input_size = 2048
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 0.2

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

service_context = ServiceContext.from_defaults(chunk_size=512, llm=llm, prompt_helper=prompt_helper, embed_model=embed_model)

Yep Just to confirm, the max input size is of 2048 and with it, chunk size 512 will go, chunk size of 1024 didn't give me any result for the query. tysm! — Avish Wagde, Aug 18 '23 at 05:43

ERROR: The prompt size exceeds the context window size and cannot be processed

1 Answers1

Linked