great_expectations setup:
Created a new virtual environment Installed required packages:
pip install boto3
pip install fsspec
pip install s3fs
Updated data source in configuration: great_expectations.yml
datasources:
pandas_s3:
class_name: PandasDatasource
Steps to reproduce this isse:
> great_expectations init
Would you like to profile new Expectations for a single data asset within your new Datasource? [Y/n]: Y
Enter the path of a data file (relative or absolute, s3a:// and gs:// paths are ok too)
: s3://my-bucket-name/
We could not determine the format of the file. What is it?
1. CSV
2. Parquet
3. Excel
4. JSON
: 2
Getting below error:
Cannot connect to host s3.amazonaws.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')]
Note: I have the setup for aws ready. ~/.aws has credentials file with below content.
output = json
region = us-east-1
aws_access_key_id = api-key
aws_secret_access_key = secret-key
aws_session_token = sesssion-token
aws_default_acl = None
With the same above setup,
Note: Below code works fine:
import boto3
import io
import pandas as pd
def pd_read_s3_parquet(key, bucket, s3_client=None, **args):
if s3_client is None:
s3_client = boto3.client('s3')
obj = s3_client.get_object(Bucket=bucket, Key=key)
return pd.read_parquet(io.BytesIO(obj['Body'].read()), **args)
print(pd_read_s3_parquet(key="books.parquet", bucket="books-bucket-ge"))
So, connection through ge library blocks the way.
Facing same issue with V3 batch_request API using test_yaml_config.
Using version 0.13.10
Blocked on this, please suggest ways to get rid of this issue. Thanks!