1

I am working with a dataset (csv format) and creating a custom trained chatbot using the ChatGPT API in Python. Approximately there are 1000 observations and 12 variables. I was able to train the model, however when using asking questions, the chatbot does not give the required results. For example when I ask "What is the average age of the employees?" the result that I get is 15.5, which is incorrect (should be around 40). An other example, "How many males are there in the dataset?", the output is 60, however there are 340 males in the dataset.

I am quite sure that this has to do something with preprocessing the data, but I could not work my way around it. My other is to convert it to json format, from which the model would be able learn more accurately.

Has anyone else met with this issue? Did anyone else met with this issue? How did you manage to solve it?

totnan
  • 67
  • 6
  • Have you tried to generate sentences to train it on? For every category, automatically generate the sentence "There are {number} {category} in the dataset", and the sentence "The average {xx} of the {yy} is {zz}", etc, and train your chatgpt on that (or feed it as the first part of the prompt, before the question). – Stef Jul 04 '23 at 11:47
  • My issue with this is that if I want to ask a similar question for another variable (what is the average salary), then I would have to include it manually. This obviously can not be done for a high dimensional dataset – totnan Jul 04 '23 at 11:50
  • Sorry, I meant "automatically", not "manually" – Stef Jul 04 '23 at 12:02

0 Answers0