1

I am currently trying to customize a tfx pipeline, which is being orchestrated by Airflow.

Basically I want to upload all generated files by the DAG to a S3 bucket storage after the run of the pipeline.
The issue is that every tfx pipeline component gets parsed into a individual DAG, because of this I can not access these via abc.downstream("xy") By my understanding, see this documentation for how the pipeline gets build dag-runner.

My code is based on the tfx-taxi-example.

Maybe I have not fully grasped the concepts of the TFX pipeline yet, so please make me aware if I am trying to solve this in a overly complex way. I would greatly appreciate an alternative approach to what I am trying to accomplish ;)

niklas_den
  • 19
  • 8
  • Not sure what you want to fetch -> Perhaps the artifacts that you wrote in the S3 bucket? If yes, you can find them in your AIRFLOW_HOME which you define when you run this pipeline. An alternate would be to use ZenML, it has a simpler syntax and can fetch artifacts directly: https://github.com/zenml-io/zenml/tree/main/examples/airflow_local. P.S. Im the co-creator of ZenML (disclaimer), so let me know if you need help! – Hamza Tahir Feb 03 '22 at 18:15
  • Thanks for reaching out! Actually I want to upload all the generated artifacts to a different s3 bucket. So after the normal tfx pipeline steps are run, I would like to move the data to a different bucket. I will definetly checkout your project. – niklas_den Feb 03 '22 at 20:42

0 Answers0