0

My datasource config looks like:

datasource_config = {
    "name": "example_datasource",
    "class_name": "Datasource",
    "module_name": "great_expectations.datasource",
    "execution_engine": {
        "module_name": "great_expectations.execution_engine",
        "class_name": "PandasExecutionEngine",
    },
    "data_connectors": {
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "module_name": "great_expectations.datasource.data_connector",
            "batch_identifiers": ["default_identifier_name"],
        },
    },
}
context.add_datasource(**datasource_config)

My Pandas dataframe and batch_requests were successfully created by following commands:

...
df = read_csv_pandas(file_path="../done/my_file.txt", 
                           sep="|", 
                           header=0,
                           quoting=csv.QUOTE_ALL)

batch_request = RuntimeBatchRequest(
datasource_name="example_datasource",
data_connector_name="default_runtime_data_connector_name",
data_asset_name="MyDataAsset",
runtime_parameters={"batch_data": df},
batch_identifiers={"default_identifier_name": "default_identifier"}
)

My expectation suite:

expectation_suite_name = "My_validations"
suite = context.create_expectation_suite(expectation_suite_name, overwrite_existing=True)

Then I'm creating the validator.

validator = context.get_validator(
    batch_request=batch_request, expectation_suite_name=expectation_suite_name
)
validator.head(2)

The last command successfully prints 2 rows of my dataframe.

Then I'm adding expectations to my suite.

validator.expect_table_columns_to_match_ordered_list(['last_name', 'first_name', 'sex'])
validator.expect_column_values_to_be_in_set("sex", ["male", "female", "other", "unknown"])
validator.save_expectation_suite(discard_failed_expectations=False)

Then I'm generating data docs:

suite_identifier = ExpectationSuiteIdentifier(expectation_suite_name=expectation_suite_name)
context.build_data_docs(resource_identifiers=[suite_identifier])
context.open_data_docs(resource_identifier=suite_identifier)

My checkpoint looks like:

name: my_checkpoint_2
config_version: 1
class_name: SimpleCheckpoint
validations:
    - batch_request:
        datasource_name: example_datasource
        data_connector_name: default_runtime_data_connector_name
        data_asset_name: MyDataAsset
        runtime_parameters:
          batch_data: {df}
        batch_identifiers:
          default_identifier_name: default_identifier
expectation_suite_name: My_validations

But this command

context.run_checkpoint(checkpoint_name="my_checkpoint_2")

produces the error:

ValueError: RuntimeDataBatchSpec must provide a Pandas DataFrame or PandasBatchData object.
Valentyn
  • 562
  • 1
  • 7
  • 21

1 Answers1

0

Great expectations has multiple execution engines. You are specifying the PandasExecutionEngine. The execution engine should be changed to SparkDFExecutionEngine or you should cast your dataframe to Pandas.

  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 24 '22 at 14:36