1

I want to use the Great Expectations testing suite to run the same validations on many columns. I see that there's a closed feature request to have this as a built-in expectation, but can this be done with a for-loop over the column names?

In addition, I need to filter which columns to test-- I am training various computer vision models on different classes ids, so I need to select all columns corresponding to class ids.

crypdick
  • 16,152
  • 7
  • 51
  • 74

1 Answers1

1

Unfortunately, if you search the docs for filter() there isn't anything documented, but if you check type(batch) you see that it's a great_expectations.dataset.pandas_dataset.PandasDataset, which according to the docs subclasses pandas.DataFrame.

So, you can filter columns as you would a regular dataframe using batch.filter() and run a for loop on the columns:

Expectations on filtered columns

There's a gotcha, though: you can't run the expectations directly on the filtered DataFrame; instead, you have to run the expectations on the original batch dataset, or else you will get errors when you try to do filtered_df.save_expectation_suite()

Expectation results

crypdick
  • 16,152
  • 7
  • 51
  • 74