1

I am experiencing with langchain so my question may not be relevant but I have trouble finding an example in the documentation.

Actually, as far as I understand, SequentialChain is made to receive one or more inputs for the first chain and then feed the output of the n-1 chain into the n chain.

Let's say I am working with 3 chains, the first one that takes as input snippet of a csv file and some description about where the csv came from, the next one that take as input snippet of our csv file AND output of the first chains to produce a python script as output.

here is the "no sequential" version that work :

DATA_REVIEW = """ You are a datascientist specialized in business analysis. You are able to retrieve the most relevant metrics in every json file. You are able to give complete and detailed review of how thoses metrics can be used for making profit. A snippet of the full Json is given as context. Your role is to write down all type of metrics that can be retrieved from the full json. Don't do the calculation, the metrics list will be send to a python developer. You also should include metrics that can be used for comparison.

after the metrics list, write the columns name list. 

context:
{data}


Metrics that can be retrieved from the full json:
"""
PYTHON_SCRIPT = """You are a datascientist specialized in business analysis. You are able to write powerfull and efficient python code to retrieve metrics from a dataset. Your role is to write a python script for all type of metrics described above based on the structure of the dataset. Your python script should print all metrics calculated and 
each products followed by their whole metrics. You should always use pandas library. After you printed out all the metrics, store them as in the example below:
metrics_result = f'Total number of products: (total_products)'
metrics_result += f'Average price of products: (avg_price)'
for index, row in df.iterrows():
    metrics_result += f'Product ID: (row["product_id"])'
    metrics_result += f'Product Name: (row["product_name"])'

Make sure to replace unwanted character for each column and to convert value to the desired type before going into calculation. Also pay attention to the columns exact name. Data are represented as a json below but the file they came from is an xlsx. Your code should always start with :



structure:
{data}

Metrics to retrieve:
{output}


python script:


"""
prompt_template = PromptTemplate(
            input_variables=['data'],
            template=DATA_REVIEW
            )
        openai = OpenAI(model_name="text-davinci-003",openai_api_key='KEY', temperature=0, max_tokens=3000)
        output = openai(prompt_template.format(data=data))
        python_script_template = PromptTemplate(
            input_variables=['data','output'],
            template=PYTHON_SCRIPT
            )
        openai = OpenAI(model_name="text-davinci-003",openai_api_key='KEY', temperature=0, max_tokens=3000)
        script = openai(python_script_template.format(
                output = output,
                data = data
                ))


#Actual sequential chain script 'not working' 

llm = OpenAI(temperature=0.0)

prompt = PromptTemplate(
    input_variables=["data_snippet"],
    template="""You are a datascientist specialized in business analysis. You are able to retrieve the most relevant metrics in every json file. You are able to give complete and detailed review of how thoses metrics can be used for making profit. Your next project is for a Beauty e-shop business. a snippet of the full Json is given as context. Your role is to write down all type of metrics that can be retrieved from the full json. You also should include metrics that can be used for comparison.
    context:
        {data_snippet}
    
    metrics that can be retrieved from the complete file:
"""
)


chain = LLMChain(llm=llm, prompt=prompt, output_key='metrics')


data_snippet = read_csv_data(csv_file_path)


data_snippet_str = str(data_snippet)
metrics = chain.run(data_snippet_str)
second_prompt = PromptTemplate(
    input_variables=["data_snippet", "metrics"],
    template=
"""You are a datascientist specialized in business analysis. You are able to write powerfull and efficient python code to retrieve metrics from a dataset. Your role is to write a python script for all type of metrics described above based on the structure of the dataset. Your python script should print all metrics calculated and 
    each products followed by their whole metrics. You should always use pandas library. After you printed out all the metrics, store them as in the example below:
        metrics_result = f'Total number of products: (total_products)'
        metrics_result += f'Average price of products: (avg_price)'
        for index, row in df.iterrows():
            metrics_result += f'Product ID: (row["product_id"])'
            metrics_result += f'Product Name: (row["product_name"])'

    Make sure to replace unwanted character for each column and to convert value to the desired type before going into calculation. Also pay attention to the columns exact name. Data are represented as a json below but the file they came from is an xlsx. Your code should always start with :
        import pandas as pd
        data = CSV_FILE
        df = pd.read_csv(data)


    structure:
        {data_snippet}

    Metrics to retrieve:
        {metrics}


    python script:
"""
)

chain_two = LLMChain(llm=llm, prompt=second_prompt, output_key='script')

from langchain.chains import SimpleSequentialChain

overall_chain = SimpleSequentialChain(chains=[chain, chain_two], input_variables=['data_snippet_str'], output_variables=["metrics","script"], verbose=True)


python_script = overall_chain.run([data_snippet_str, chain_two])
Yilmaz
  • 35,338
  • 10
  • 157
  • 202

2 Answers2

1

this is the error that you are getting:

enter image description here

First two validation errors are SimpleSequentialChain does not have output_variables and input_variables named parameter

The third validation error is, your prompt templates should have only one input.

second_prompt has two input variables

second_prompt = PromptTemplate(
    input_variables=["data_snippet", "metrics"],

if you remove metrics from input_variables and template itself

Metrics to retrieve:
        # remove this from template
        {metrics}

it will work. Proof of work

enter image description here

Yilmaz
  • 35,338
  • 10
  • 157
  • 202
0

I've spent 4 hours so far trying to solve this. Makes me question the complexity LangChain adds to do something that should be so simple - passing multiple values in between chains.

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 31 '23 at 20:52