Problems getting GPT-3 to conduct statistical analysis of a dataframe or json file

Question

I'm having problems getting gpt-3 to do simple statistical summaries of a dataframe/json file - using python pandas as the suggested prompt, oddly it does categorical analysis of the df, ie value_counts etc but seems to hallucinate when faced with numbers but i'm sure its just my error. I'm not sure if its the way i'm feeding the df in or something else? cheers.

import pandas as pd 
import openai 
from sqlalchemy import create_engine

openai.api_key =  "API_KEY"

query = "Select * from table"
engine = create_engine()

df = pd.read_sql_query(query, engine)

completion = openai.Completion.create(
    engine="text-davinci-003",
    temperature=0.5,
    max_tokens=100,
    n = 1,
    stop=None,
   prompt = (f"print sum of values in df called {df}" ) ) `

a = completion.choices[0].text  
print(a)

i tried to use different versions of the file and different ways of calling but to no effect, changed engines also

I solved the issue, two things. 1. better prompt design, providing all tables with column names + primary/foreign keys so it can join to get your info as appropriate. 2. dont use the whole df in the data sent, only use info mentioned in 1 then apply result to your df,this also solves any data governance issue your org may have — Weegie, Dec 21 '22 at 09:59
Hei, I was looking for similar thing. But I have a question still, that if we want to point the dataset to the model, what form does it take? How to split the data, how to pass the headers and the values? And even after doing that, can it give the response? — Aayush Shah, Mar 17 '23 at 13:16
Like if I am passing the data of my sales, and if I ask "how my sales been performing?" Then will it able to give me summary or it just doesn't understand the structure? — Aayush Shah, Mar 17 '23 at 13:17

Problems getting GPT-3 to conduct statistical analysis of a dataframe or json file

0 Answers0