0

My use case is quite simple:

When file dropped in the FTP server directory, SFTPSensor task picks the specified txt extension file and process the file content.

path="/test_dir/sample.txt" this case is working.

my requirement is to read the dynamic filenames with only the specified extension(text files).

path="/test_dir/*.txt", in this case file poking is not working..

#Sample Code

from airflow.models import DAG
from airflow.operators.python import PythonOperator
from airflow.providers.sftp.sensors.sftp import SFTPSensor
from airflow.providers.ssh.hooks.ssh import SSHHook
from datetime import datetime

default_args= {
    "owner": "airflow",
    "depends_on_past": False,
    "start_date": datetime(2022, 4, 16)
}

with DAG(
        'sftp_sensor_test',
        schedule_interval=None,
        default_args=default_args
    ) as dag:

    waiting_for_file = SFTPSensor(
        task_id="check_for_file",
        sftp_conn_id="sftp_default",
        path="/test_dir/*.txt",     #NOTE: Poking for the txt extension files
        mode="reschedule",
        poke_interval=30
        )

waiting_for_file
Progman
  • 16,827
  • 6
  • 33
  • 48
sri
  • 21
  • 3

1 Answers1

0

To achieve what you want, I think you should use the file_pattern argument as follows :

waiting_for_file = SFTPSensor(
        task_id="check_for_file",
        sftp_conn_id="sftp_default",
        path="test_dir",
        file_pattern="*.txt",
        mode="reschedule",
        poke_interval=30
        )

However, there is currently a bug for this feature → https://github.com/apache/airflow/issues/28121

While this gets solved, you can easily create a local fixed version of the sensor in your project following the issue's explanations.

Here is the file with the current fix: https://github.com/RishuGuru/airflow/blob/ac0457a51b885459bc5ae527878a50feb5dcadfa/airflow/providers/sftp/sensors/sftp.py