s3fs==2022.8.2
great-expectations==0.15.26
It was not easy to find a clear documentation and concrete examples for Great-Expectations. After several tries I succeeded to connect to the s3 bucket;
import great_expectations as ge
from great_expectations.core.batch import BatchRequest
context = ge.data_context.DataContext(context_root_dir="./great_expectations")
# list available datasets names from datasource name
context.get_available_data_asset_names(datasource_names='s3_datasource')
* * * * * **
** output **
* * * * * **
{
"s3_datasource":{
"default_runtime_data_connector_name":[],
"default_inferred_data_connector_name":[
"data/yellow_tripdata_sample_2019-01",
"data/yellow_tripdata_sample_2019-02"]
}
}
# Here is a BatchRequest naming a data_asset
batch_request_parameters = {
'datasource_name': 's3_datasource',
'data_connector_name': 'default_inferred_data_connector_name',
'data_asset_name': 'data/yellow_tripdata_sample_2019-01',
'limit': 1000
}
batch_request=BatchRequest(**batch_request_parameters)
context.create_expectation_suite(
expectation_suite_name='taxi_demo', overwrite_existing=True
)
* * * * * *
# output **
* * * * * *
{
"data_asset_type": null,
"meta": {
"great_expectations_version": "0.15.26"
},
"expectations": [],
"ge_cloud_id": null,
"expectation_suite_name": "taxi_demo"
}
validator = context.get_validator(
batch_request=batch_request, expectation_suite_name='taxi_demo')
* * * * * **
** output **
* * * * * **
# NoCredentialsError: Unable to locate credentials
So far everything is correct, the problem is when I call the function get_validator
; NoCredentialsError: Unable to locate credentials
great_expectations.yaml
datasources:
s3_datasource:
module_name: great_expectations.datasource
execution_engine:
class_name: PandasExecutionEngine
module_name: great_expectations.execution_engine
class_name: Datasource
data_connectors:
default_runtime_data_connector_name:
module_name: great_expectations.datasource.data_connector
class_name: RuntimeDataConnector
batch_identifiers:
- default_identifier_name
default_inferred_data_connector_name:
prefix: data/
module_name: great_expectations.datasource.data_connector
default_regex:
pattern: (.*)\.csv
group_names:
- data_asset_name
boto3_options:
endpoint_url: http://localhost:9000
aws_access_key_id: minio
aws_secret_access_key: minio
bucket: ge-bucket
class_name: InferredAssetS3DataConnector
Note
When I try in command line great_expectations suite new
I got the same problem approximately;
EndpointConnectionError: Could not connect to the endpoint URL: "https://ge-bucket.s3.us-west-4.amazonaws.com/data/yellow_tripdata_sample_2019-01.csv"
I don't understand where the GE got the s3 credentials !?
After a long debugging, I noticed that GE is looking for s3 credentials from .aws/config
. Really I don't understand why GE is looking for s3 credentials from .aws/config instead of my configuration file great_expectations.yaml
mentioned above.