1
s3fs==2022.8.2
great-expectations==0.15.26

It was not easy to find a clear documentation and concrete examples for Great-Expectations. After several tries I succeeded to connect to the s3 bucket;

import great_expectations as ge
from great_expectations.core.batch import BatchRequest

context = ge.data_context.DataContext(context_root_dir="./great_expectations")

# list available datasets names from datasource name
context.get_available_data_asset_names(datasource_names='s3_datasource')

* * * * * **
** output **
* * * * * **
{
  "s3_datasource":{
  "default_runtime_data_connector_name":[],
  "default_inferred_data_connector_name":[
     "data/yellow_tripdata_sample_2019-01",
     "data/yellow_tripdata_sample_2019-02"]
  }
}

# Here is a BatchRequest naming a data_asset
batch_request_parameters = {
 'datasource_name': 's3_datasource',
 'data_connector_name': 'default_inferred_data_connector_name',
 'data_asset_name': 'data/yellow_tripdata_sample_2019-01',
 'limit': 1000
}

batch_request=BatchRequest(**batch_request_parameters)

context.create_expectation_suite(
   expectation_suite_name='taxi_demo', overwrite_existing=True
)

* * * * * *
# output **
* * * * * *
{
   "data_asset_type": null,
   "meta": {
   "great_expectations_version": "0.15.26"
},
   "expectations": [],
   "ge_cloud_id": null,
   "expectation_suite_name": "taxi_demo"
}

validator = context.get_validator(
batch_request=batch_request, expectation_suite_name='taxi_demo')

* * * * * **
** output **
* * * * * **
# NoCredentialsError: Unable to locate credentials

So far everything is correct, the problem is when I call the function get_validator; NoCredentialsError: Unable to locate credentials

great_expectations.yaml

datasources:
  s3_datasource:
    module_name: great_expectations.datasource
    execution_engine:
      class_name: PandasExecutionEngine
      module_name: great_expectations.execution_engine
    class_name: Datasource
    data_connectors:
      default_runtime_data_connector_name:
        module_name: great_expectations.datasource.data_connector
        class_name: RuntimeDataConnector
        batch_identifiers:
          - default_identifier_name
      default_inferred_data_connector_name:
        prefix: data/
        module_name: great_expectations.datasource.data_connector
        default_regex:
          pattern: (.*)\.csv
          group_names:
            - data_asset_name
        boto3_options:
          endpoint_url: http://localhost:9000
          aws_access_key_id: minio
          aws_secret_access_key: minio
        bucket: ge-bucket
        class_name: InferredAssetS3DataConnector

Note

When I try in command line great_expectations suite new I got the same problem approximately;

EndpointConnectionError: Could not connect to the endpoint URL: "https://ge-bucket.s3.us-west-4.amazonaws.com/data/yellow_tripdata_sample_2019-01.csv"

I don't understand where the GE got the s3 credentials !?

After a long debugging, I noticed that GE is looking for s3 credentials from .aws/config. Really I don't understand why GE is looking for s3 credentials from .aws/config instead of my configuration file great_expectations.yaml mentioned above.

Adil Blanco
  • 616
  • 2
  • 6
  • 23
  • I am having exactly the same issue. The is no place to add the `boto3_options`. Currently the way it works for me is setting the AWS Environment on env variables: `os.environ["AWS_KEY"] = AWS_KEY`. My guess is that it uses boto3 default credentails search. ```Boto3 searches for credentials is: 1. Passing credentials as parameters in the boto.client() method 2. Passing credentials as parameters when creating a Session object 3. Environment variables 4. Shared credential file (~/.aws/credentials) 5. AWS config file (~/.aws/config) 6. Assume Role provider ... ``` – Camilo Velasquez Oct 07 '22 at 17:21
  • Thanks for the answer, I'm in a dev environment and I'm using a local s3 instance, so there is no way to add endpoint_url. So the solutions you mentioned above have no way to add the endpoint_url variable. – Adil Blanco Oct 08 '22 at 22:08
  • Maybe you could combine the `AWS_CONFIG_FILE` environment and inside of your config file add something similar to this: https://github.com/boto/boto3/issues/1375#issuecomment-585488189 – Camilo Velasquez Oct 09 '22 at 23:27
  • You are right, my problem is with my local configuration of the s3. it doesn't solve my problem, but I accept it anyway, it gave me ideas. Thank you @CamiloVelasquez – Adil Blanco Oct 12 '22 at 19:09
  • @CamiloVelasquez I am currently trying to connect to S3 using Great Expectations, I have specified the executions engine as "class_name": "PandasExecutionEngine", "module_name": "great_expectations.execution_engine" But it still says No ExecutionEngine configuration provided, did you face this issue while connecting to s3? – Varun Feb 02 '23 at 23:24
  • @Varun Were you able to solve it? Can you share your config file? Did you validate that it has the appropaiate format? I would recommend you to run the [check config](https://docs.greatexpectations.io/docs/guides/miscellaneous/how_to_use_the_project_check_config_command/) command – Camilo Velasquez Feb 09 '23 at 16:53
  • 1
    @CamiloVelasquez Yes, I was able to connect with S3. Created the datasource_config first and then created data_context_config. Later added the datasource. context = gx.get_context(project_config=data_context_config) context.add_datasource(**datasource_config) – Varun Feb 14 '23 at 01:01

0 Answers0