0

I am using great-expectation for pipeline testing.

I have One Dataframe batch of type :- great_expectations.dataset.pandas_dataset.PandasDataset

I want to build dynamic validation expression.

i.e batch.("columnname","value") in which validationtype columname and value coming from json file .

JSON structure:-

{
            "column_name": "sex",
            "validation_type": "expect_column_values_to_be_in_set",
            "validation_value": ["MALE","FEMALE"]
        },

when i am building this expression getting error message described below .

Code:-

def add_validation(self,batch,validation_list):
             for d in validation_list:
                 expression = "." + d["validation_type"] + "(" + d["column_name"] + "," + 
                             str(d["validation_value"]) + ")"
                 print(expression)
                 batch+expression
                 batch.save_expectation_suite(discard_failed_expectations=False)
                 return batch

Output:-

print statement output
.expect_column_values_to_be_in_set(sex,['MALE','FEMALE'])

Error:-

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('

Vineet
  • 21
  • 1
  • 2

1 Answers1

1

In great_expectations, the expectation_suite object is designed to capture all of the information necessary to evaluate an expectation. So, in your case, the most natural thing to do would be to translate the source json file you have into the great_expectations expectation suite format.

The best way to do that will depend on where you're getting the original JSON structure from -- you'd ideally want to do the translation as early as possible (maybe even before creating that source JSON?) and keep the expectations in the GE format.

For example, if all of the expectations you have are of the type expect_column_values_to_be_in_set, you could do a direct translation:

expectations = []
for d in validation_list:
  expectation_config = {
    "expectation_type": d["validation_type"],
    "kwargs": {
      "column": d["column_name"],
      "value_set": d["validation_value"]
    }
  }
expectation_suite = {
  "expectation_suite_name": "my_suite",
  "expectations": expectations
}

On the other hand, if you are working with a variety of different expectations, you would also need to make sure that the validation_value in your JSON gets mapped to the right kwargs for the expectation (for example, if you expect_column_values_to_be_between then you actually need to provide min_value and/or max_value).

James
  • 576
  • 2
  • 5