I'm looking to create a DAG that can run two tasks. The ideal workflow would be to create a PEM key file then SSH using that newly created file.
One task would be with the PythonOperator to call AWS Secrets Manager through the SecretsManagerHook. This would grab a PEM key that is stored there and write a temporary file.
The second task would use the SSHOperator and in the ssh_hook it would reference the PEM file.
I seem to be able to get the temporary file written to a location but I can't access it in the next task with the SSHOperator.
# USED TO GET PEM KEY FROM AWS SECRETS MANAGER
def retrieve_pem_key_from_aws_secrets_manager(**kwargs):
secrets_manager_hook = SecretsManagerHook()
sm_client = secrets_manager_hook.get_conn()
secret = sm_client.get_secret_value(SecretId='<SECRET>')
pem_key_value = secret["SecretString"]
pem_file = tempfile.NamedTemporaryFile(mode='w+', encoding='UTF-8')
pem_file.write(pem_key_value)
pem_file.seek(0)
return(pem_file.name)
PythonOperator task:
task1=PythonOperator(
task_id='create_pem_key',
python_callable=retrieve_pem_key_from_aws_secrets_manager
)
This outputs something such as /tmp/<FILE_NAME>
My SSHOperator task:
task2=SSHOperator(
task_id='test_ssh_connectivity',
ssh_conn_id=None,
ssh_hook=SSHHook(ssh_conn_id=None, remote_host=<HOST>, username="ec2-user", key_file=retrieve_pem_key_from_aws_secrets_manager()),
command='echo Hello'
)
This file referenced below is different from the file generated in task1.
airflow.exceptions.AirflowException: SSH operator error: [Errno 2] No such file or directory: '/tmp/tmpt6gh71_b'
It seems that both tasks run on the same host in AWS, but my understanding of MWAA is that Fargate containers are used with task execution so it could be possible that these are running on different containers. Is it possible to run both tasks together?
I additionally do have the option of using the Secrets Manager backend but I have not explored that yet. I am not sure if that will help in this scenario.
Is there an ideal way to write these tasks so that I can just use Python code to do what I need with minimal configuration of my Airflow environment?