TL;DR I can't seem to find the temp files created during my airflow dag run no matter what I do.
Hello Folks,
I'm working with an Apache Airflow(1.10.12) system on Ubuntu 20.04.
My process is simple
- Download FTP Files
- Split lines of files and collect into separate files due to what's in the line
For the "Download" step, I'm choosing to download the files to a temp directory which I create with the following BashOperator:
create_temp_dir_command = 'pwd ' \
'&& tmp_dir=$(mktemp -d -p /var/tmp ' \
'-t ftp-$(date +%Y-%m-%d-%H-%M-%S)-airflow-wexftp-XXXXXXXXXX) ' \
'&& echo $tmp_dir'
t2 = BashOperator(
task_id='create_temp_dir',
bash_command=create_temp_dir_command,
xcom_push=True,
dag=dag,
)
Because I'm in development, I'd like to inspect the files. However, the files never seem to exist on the server. My logs look like this:
INFO - Temporary script location: /tmp/airflowtmp_vhpdkgo/create_temp_dirxjyod2h5
INFO - Running command: pwd && tmp_dir=$(mktemp -d -p /var/tmp -t ci-$(date +%Y-%m-%d-%H-%M-%S)-airflow-wexftp-XXXXXXXXXX) && echo $tmp_dir
INFO - Output:
INFO - /tmp/airflowtmp_vhpdkgo
INFO - /var/tmp/ci-2020-10-16-11-18-48-airflow-wexftp-Ro0onvw0mq
When I try to change into any of the directories listed, in the logs, I get that they don't exist. I've tried checking the size of the files downloaded to disk, and they do have a size so they do exist.
Why are these directories so invisible and untouchable, even when I'm in a root shell?
Any help is greatly appreciated.
EDIT: I have a single node setup. So the scheduler and workers are the same instance.