1

I would like to know if it is possible to use pandera decorator to specify multiple output schemas.

Let's say for example you have a function that returns 2 dataframes and you want to check the schema of these dataframes using check_io() decorator:

import pandas as pd
import pandera as pa

from pandera import DataFrameSchema, Column, Check, check_input

df = pd.DataFrame({
   "column1": [1, 4, 0, 10, 9],
   "column2": [-1.3, -1.4, -2.9, -10.1, -20.4],
})

in_schema = DataFrameSchema({
   "column1": Column(int),
   "column2": Column(float),
})

out_schema1 = DataFrameSchema({
   "column1": Column(int),
   "column2": Column(float),
   "column3": Column(float),
})

out_schema2 = DataFrameSchema({
   "column1": Column(int),
   "column2": Column(float),
   "column3": Column(int),
})

def preprocessor(df1, df2):
    df_out1 = (df1 + df2).assign(column3=lambda x: x.column1 + x.column2)
    df_out2 = (df1 + df2).assign(column3=lambda x: x.column1 ** 2)
    return df_out1, df_out2

How would this be implemented for the above example?

kanimbla
  • 858
  • 1
  • 9
  • 23

1 Answers1

2

just in case anyone else is looking for the solution:

@pa.check_io(df1=in_schema, df2=in_schema, out=[(0, out_schema1), (1, out_schema2)])
def preprocessor(df1, df2):
    df_out1 = (df1 + df2).assign(column3=lambda x: x.column1 + x.column2)
    df_out2 = (df1 + df2).assign(column3=lambda x: x.column1 ** 2)
    return df_out1, df_out2

preprocessor(df, df)
kanimbla
  • 858
  • 1
  • 9
  • 23