0

I am trying to add a conditional expectation that checks if the column "Value" is not equal to zero but only for a subset of the dataset where the column "Condition" contains the string "A".

I have two problems

  1. I don't know how to implement the contains/like functionality with the "Condition" column that should contain the string "A"

  2. Even if I use the examples with the equal sign from the internet, I currently get the following error message:

     df.expect_column_values_to_not_be_in_set(
    
         column='Value',
    
         value_set=[0],
    
         row_condition='Condition=="A"',
    
         result_format = "SUMMARY"
    
     )
    

TypeError: expect_column_values_to_not_be_in_set() got an unexpected keyword argument 'row_condition'

(The df is a delta file path converted with the SparkDFDataset function from great_expectations.dataset.sparkdf_dataset import SparkDFDataset)

Thank you very much in advance!

I also tried it with the condition_parser but I got the same error message.

These are the links I used to come up with my code: https://docs.greatexpectations.io/docs/reference/expectations/conditional_expectations/#data-docs-and-conditional-expectations https://legacy.docs.greatexpectations.io/en/latest/reference/conditional_expectations.html

yuki
  • 3
  • 2

1 Answers1

0

Try below code according to your data set.

import great_expectations as gx
df = spark.read.format("csv").option("header","true").load("/FileStore/tables/source1_data.csv")
display(df)

enter image description here

pandas_df = df.toPandas()
finalDF = gx.from_pandas(pandas_df)
finalDF.expect_column_values_to_not_be_in_set(
column='level',
value_set=[0],
row_condition='line_code=="D0203"',
condition_parser='pandas',
result_format = "SUMMARY"
)

enter image description here

JayashankarGS
  • 1,501
  • 2
  • 2
  • 6
  • Thank you very much that worked! Does that mean there is no way to implement a conditional expectation when converting the dataframe with the SparkDFDataset function? – yuki May 30 '23 at 06:37