0

I'm really curious to know, how airflow sensors work when a file is not completely copied/loaded to a target data storage. As an example: we have a filesystem, and sensor checks files within it. We are copying a large file to folder and it really takes some time. Will airflow sensor consume incomplete file or it will wait for the file to be fully loaded?

I'm really looking for an answer and haven't found anything similar

1 Answers1

0

you can see inside the code of FileSensor the answer.

The answer is that the operation system is responsible for the answer. in a test I made while downloading a big file "Mac" for example returned True while downloading the file

the idea is while you copy a big file give it a temporary extension and replace it to the real one after copy finished

 def poke(self, context: Context):
    hook = FSHook(self.fs_conn_id)
    basepath = hook.get_path()
    full_path = os.path.join(basepath, self.filepath)
    self.log.info("Poking for file %s", full_path)

    for path in glob(full_path, recursive=self.recursive):
        if os.path.isfile(path):
            mod_time = datetime.datetime.fromtimestamp(os.path.getmtime(path)).strftime("%Y%m%d%H%M%S")
            self.log.info("Found File %s last modified: %s", str(path), mod_time)
            return True

        for _, _, files in os.walk(path):
            if len(files) > 0:
                return True
    return False
ozs
  • 3,051
  • 1
  • 10
  • 19
  • so, as i understood your answer, isfile returns True even if file is not completely copied, cause it is opened at the moment. So i need to implement adding, as an example '.' at the start of filename, and then replace it. Am i right? – Data Therapist Jun 16 '23 at 20:45
  • right. Its a known trick. You can see an example when chrome is downloading file. – ozs Jun 17 '23 at 19:24