In MWAA, I am using the following code to access the files in my S3 bucket. The S3 bucket is of the following form:
aws s3 ls s3://example-bucket/incoming/driver-events/ingestDate=2021-05-26/
The above command works fine. Now I am attempting to get the same information from an S3_hook.S3Hook()
call from Airflow. I have the following code:
bucket='s3://example-bucket/incoming/driver-events/ingestDate=2021-05-26/'
s3_handle = S3_hook.S3Hook(aws_conn_id='s3_default')
key_list = s3_handle.list_keys(bucket_name=bucket)
print(f"{len(key_list)} keys found in bucket")
for keys in key_list:
logging.info(keys)
This is resulting in an error from boto3:
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid bucket name "s3://example-bucket/incoming/driver-events/ingestDate=2021-05-26/": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:s3:[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
I can understand that the error is coming because boto3 is attempting to do some parameter validation and the regular expression is too restrictive.
How do I handle this case in Airflow? Is there any way I can disable the parameter validation? I can see that one can set 'parameter_validation' to False
in boto3 through some configuration setting, but how do I do that when using an S3Hook()
in Airflow that is already set up in its default way and cannot accept a boto3 configuration? And making it more complicated is that I have to do it on MWAA which does not give you any control over ~/.boto/
folder.