I am trying the Scikit-LLM on a StackOverflow question dataset comprising around 7k rows. Below is the code where I train and test a Zero Shot Classifier.
X_train, X_test, y_train, y_test =
train_test_split(_soQuestions['Body'], _soQuestions['isClosed'], test_size=0.33, random_state=42, stratify=_soQuestions['isClosed'])
#%%
from skllm import ZeroShotGPTClassifier
clf = ZeroShotGPTClassifier(openai_model="gpt-3.5-turbo")
clf.fit(X_train, y_train)
labels = clf.predict(X_test)
After half an hour, I received the following error. However, I have no idea how to divide the dataset into chunks of proper sizes.
Could not obtain the completion after 3 retries:
InvalidRequestError :: This model's maximum context length is 4097 tokens. However, your messages resulted in 4438 tokens. Please reduce the length of the messages.
I appreciate any advice.