3

TL;DR I can't seem to find the temp files created during my airflow dag run no matter what I do.

Hello Folks,

I'm working with an Apache Airflow(1.10.12) system on Ubuntu 20.04.

My process is simple

  • Download FTP Files
  • Split lines of files and collect into separate files due to what's in the line

For the "Download" step, I'm choosing to download the files to a temp directory which I create with the following BashOperator:

 create_temp_dir_command = 'pwd ' \
                       '&& tmp_dir=$(mktemp -d -p /var/tmp ' \
                       '-t ftp-$(date +%Y-%m-%d-%H-%M-%S)-airflow-wexftp-XXXXXXXXXX) ' \
                       '&& echo $tmp_dir'
t2 = BashOperator(
   task_id='create_temp_dir',
   bash_command=create_temp_dir_command,
   xcom_push=True,
   dag=dag,
)

Because I'm in development, I'd like to inspect the files. However, the files never seem to exist on the server. My logs look like this:

INFO - Temporary script location: /tmp/airflowtmp_vhpdkgo/create_temp_dirxjyod2h5
INFO - Running command: pwd && tmp_dir=$(mktemp -d -p /var/tmp -t ci-$(date +%Y-%m-%d-%H-%M-%S)-airflow-wexftp-XXXXXXXXXX) && echo $tmp_dir
INFO - Output:
INFO - /tmp/airflowtmp_vhpdkgo
INFO - /var/tmp/ci-2020-10-16-11-18-48-airflow-wexftp-Ro0onvw0mq

When I try to change into any of the directories listed, in the logs, I get that they don't exist. I've tried checking the size of the files downloaded to disk, and they do have a size so they do exist.

Why are these directories so invisible and untouchable, even when I'm in a root shell?

Any help is greatly appreciated.

EDIT: I have a single node setup. So the scheduler and workers are the same instance.

Black Dynamite
  • 4,067
  • 5
  • 40
  • 75
  • 1
    Are you running the worker(s) and scheduler on your local computer? If not could you expound on your setup? – joebeeson Oct 19 '20 at 14:03

0 Answers0