0

AWS Wrangler provides a convenient interface for consuming S3 objects as pandas dataframes. I want to use this instead of boto3 clients, resources, nor sessions when getting objects. I also need to use SSL verification.

The following boto3 client code works with the SSL Aries Root cert (!)

import awswrangler as wr
import boto3
import os

aries_cert = os.environ['ARIES_CERT']

s3_session = boto3.Session(
    aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
    aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
    region_name="us-east-1"
)
s3_client = s3_session.client(
    service_name="s3",
    endpoint_url="https://MY-ENDPOINT.com",
    use_ssl=True,
    verify=aries_cert,
    aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
    aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY'),
    config=botocore.config.Config(
        read_timeout=600,
        connect_timeout=600,
        retries={"max_attempts": 3}
    )
)

bucket, prefix = path.split('/', 1)
bucket = bucket if not bucket.startswith('s3://') else bucket.split('s3://')[1]
obj = s3_client.get_object(Bucket=bucket, Key=prefix)
# Do stuff with `obj['Body'].read()`

This aws wrangler code works too (without the TLS (SSL?) client cert):

import awswrangler as wr
import boto3
import botocore
import os

wr.config.s3_endpoint_url = "https://MY-ENDPOINT.com"

session = boto3.Session(
    aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
    aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
    region_name="us-east-1"
)
path = f's3://{path}' if not path.startswith('s3://') else path

df = wr.s3.read_parquet(
    path=path,
    dataset=True,
    boto3_session=session
)

But then when I include the TLS (SSL?) client cert, the read fails:

wr.config.botocore_config = botocore.config.Config(
    retries={"max_attempts": 3},
    connect_timeout=600,
    read_timeout=600,
    client_cert=os.getenv("ARIES_CERT")
)
df = wr.s3.read_parquet(
    path=path,
    dataset=True,
    boto3_session=session
)

Error message:

SSLError: SSL validation failed for https://MY-ENDPOINT.com/MY-BUCKET?list-type=2&prefix=MY-PREFIX-BLAH-BLAH.parquet%2F&max-keys=1000&encoding-type=url [SSL] PEM lib (_ssl.c:3524)

Any idea what's going on here? I'm not finding the aws wrangler docs, nor those for boto3 and botocore very helpful:

https://aws-data-wrangler.readthedocs.io/en/latest/tutorials/002%20-%20Sessions.html https://aws-data-wrangler.readthedocs.io/en/latest/tutorials/021%20-%20Global%20Configurations.html#21---Global-Configurations https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html https://botocore.amazonaws.com/v1/documentation/api/latest/tutorial/index.html

Also this kind of question has been asked before, and if intuition can be provided on how to work with boto3 clients, resources, and sessions in different contexts, that would be appreciated.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Wassadamo
  • 1,176
  • 12
  • 32

1 Answers1

0

awswrangler loads and uses default configuration for creating boto3 session client. you can view the all default configurations used by awswrangler like this

import awswrangler as wr

print(wr.config.to_pandas())

enter image description here

To overwrite a default configuration, use config object provided by awswrangler as shown in the code below.

import awswrangler as wr
import boto3

wr.config.verify = 'your_cert_file_path'
wr.config.s3_endpoint_url = 'your_s3_endpoint_url' 

session = boto3.Session(
            aws_access_key_id='aws_key',
            aws_secret_access_key='aws_secret')

wr.s3.does_object_exist(f's3://bukcet/file_path', boto3_session=session)
Scarface
  • 359
  • 2
  • 13
  • I get the same SSL error. `SSLError: SSL validation failed for /bucket/file_path [SSL] PEM lib (_ssl.c:3932)` – Wassadamo Sep 01 '22 at 07:48