AWS Wrangler provides a convenient interface for consuming S3 objects as pandas dataframes. I want to use this instead of boto3 clients, resources, nor sessions when getting objects. I also need to use SSL verification.
The following boto3 client code works with the SSL Aries Root cert (!)
import awswrangler as wr
import boto3
import os
aries_cert = os.environ['ARIES_CERT']
s3_session = boto3.Session(
aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
region_name="us-east-1"
)
s3_client = s3_session.client(
service_name="s3",
endpoint_url="https://MY-ENDPOINT.com",
use_ssl=True,
verify=aries_cert,
aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY'),
config=botocore.config.Config(
read_timeout=600,
connect_timeout=600,
retries={"max_attempts": 3}
)
)
bucket, prefix = path.split('/', 1)
bucket = bucket if not bucket.startswith('s3://') else bucket.split('s3://')[1]
obj = s3_client.get_object(Bucket=bucket, Key=prefix)
# Do stuff with `obj['Body'].read()`
This aws wrangler code works too (without the TLS (SSL?) client cert):
import awswrangler as wr
import boto3
import botocore
import os
wr.config.s3_endpoint_url = "https://MY-ENDPOINT.com"
session = boto3.Session(
aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
region_name="us-east-1"
)
path = f's3://{path}' if not path.startswith('s3://') else path
df = wr.s3.read_parquet(
path=path,
dataset=True,
boto3_session=session
)
But then when I include the TLS (SSL?) client cert, the read fails:
wr.config.botocore_config = botocore.config.Config(
retries={"max_attempts": 3},
connect_timeout=600,
read_timeout=600,
client_cert=os.getenv("ARIES_CERT")
)
df = wr.s3.read_parquet(
path=path,
dataset=True,
boto3_session=session
)
Error message:
SSLError: SSL validation failed for https://MY-ENDPOINT.com/MY-BUCKET?list-type=2&prefix=MY-PREFIX-BLAH-BLAH.parquet%2F&max-keys=1000&encoding-type=url [SSL] PEM lib (_ssl.c:3524)
Any idea what's going on here? I'm not finding the aws wrangler docs, nor those for boto3 and botocore very helpful:
https://aws-data-wrangler.readthedocs.io/en/latest/tutorials/002%20-%20Sessions.html https://aws-data-wrangler.readthedocs.io/en/latest/tutorials/021%20-%20Global%20Configurations.html#21---Global-Configurations https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html https://botocore.amazonaws.com/v1/documentation/api/latest/tutorial/index.html
Also this kind of question has been asked before, and if intuition can be provided on how to work with boto3 clients, resources, and sessions in different contexts, that would be appreciated.