I am trying to utilize Python Repl Tool in langchain with a CSV file and send me the answer based on the CSV file content. The problem is that it gets the action_input step of writing the code to execute right. However, it fails to answer because it couldn't determine the dataframe it should run the code on.
For example, I ask it about the longest name in a dataframe containing a column named "name" and it returns the following:
Entering new AgentExecutor chain... { "action": "python_repl_ast", "action_input": "import pandas as pd\n\n# Assuming the dataset is stored in a pandas DataFrame called 'data'\nnames = data['NAMES']\nlongest_name = max(names, key=len)\nlongest_name" } Observation: NameError: name 'data' is not defined Thought:{ "action": "Final Answer", "action_input": "I apologize for the confusion. Unfortunately, I do not have access to the dataset required to find the longest name. Is there anything else I can assist you with?" }
This is the full code: `
df = pd.read_csv(file_path)
tools = [PythonAstREPLTool(locals={"df": df})]
agent = initialize_agent(
agent='chat-conversational-react-description',
tools=tools,
llm=llm,
verbose=True,
max_iterations=3,
early_stopping_method='generate',
memory=conversational_memory
)
query = 'What is the longest name?'
print(agent(query))`
Is there a way to pass the dataframe object to the Pandas REPL tool in order for the code to execute properly and return me the answer? This problem is encountered while using the GPT-3.5-turbo API model.