Previously, a similar question was asked how-to-programmatically-set-up-airflow-1-10-logging-with-localstack-s3-endpoint but it wasn't solved.
I have Airflow running in Docker container which is setup using docker-compose, I followed this guide. Now I want to download some data from an S3 bucket but I need to setup the credentials to allow that. Everywhere this only seems to be done using the UI by manually setting the AWS_ACCESS_KEY_ID
& AWS_SECRET_ACCESS_KEY
which exposes these in the UI, I want to set this up in the code itself by reading in the ENV variables. In boto3 this would be done using:
import boto3
session = boto3.Session(
aws_access_key_id=settings.AWS_SERVER_PUBLIC_KEY,
aws_secret_access_key=settings.AWS_SERVER_SECRET_KEY,
)
So how would I do this in the code for the DAGS?
Code:
import traceback
import airflow
from airflow import DAG
from airflow.exceptions import AirflowFailException
from airflow.operators.python import PythonOperator
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
def _download_s3_data(templates_dict, **context):
# contains a list of the values returned
data = templates_dict.get("sagemaker_autopilot_data")
if any([not paths for paths in data]):
raise AirflowFailException("Some of the paths were not passed!")
else:
(
sagemaker_training,
sagemaker_testing,
) = data
s3hook = S3Hook()
# parse the s3 url
bucket_name, key = s3hook.parse_s3_url(s3url=sagemaker_training)
try:
# need aws credentials
file_name = s3hook.download_file(key=key, bucket_name=bucket_name)
except:
traceback.print_exc()
raise AirflowFailException("Error downloading s3 file")
ENV file:
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
Edit:
Amazon Web Services Connection seems to be the only documentation about it but its kinda confusing and doesn't mention how to do this programatically.