0

I'd like to use the latest Airflow (2.5) and the latest Tensorflow Extended (TFX 1.12) but their python dependencies are conflicting. I built my pipeline with components of TFX and I'd like to reuse them in Airflow. How can I build the TFX pipeline in Airflow if there are dependency conflicts?

I know I can run Airflow tasks in separated virtualenv-s, but it doesn't help since I have problems with defining the DAG itself. When Airflow tries to import DAG-s then it is in its own Airflow environment (without TFX installed) but to create AirflowDagRunner object I have to import the class from tfx.orchestration.airflow.airflow_dag_runner which needs TFX packages to be installed.

All examples I could find on the internet use strictly specified/old versions of TFX/Airflow packages which are not conflicting and both were installed in the same python environment.

What are the best practices to build an Airflow pipeline/DAG using TFX components, taking into consideration that TFX and Airflow are in separate virtualenv-s?

Additional info: currently I try to run it locally.

Sample from the DAG definition

from tfx.orchestration.airflow.airflow_dag_runner import AirflowDagRunner
from tfx.orchestration.airflow.airflow_dag_runner import AirflowPipelineConfig

DAG = AirflowDagRunner(AirflowPipelineConfig(_airflow_config)).run(
    _create_pipeline(
        pipeline_name=_pipeline_name,
        pipeline_root=_pipeline_root,
        data_root=_data_root,
        module_file=_module_file,
        serving_model_dir=_serving_model_dir,
        metadata_path=_metadata_path,
        beam_pipeline_args=_beam_pipeline_args))
ribes
  • 1

0 Answers0