0

Use case:

Use Airflow operators to launch DataflowTemplate from KFP pipelines but the pipeline should be parameterised.

Issue:

I cant find a way in which I can pass the KFP pipeline args to the component function as extra args. i.e.,

import os

from kfp.v2.compiler import compiler
from kfp.v2.dsl import pipeline
from kfp import components

from airflow.providers.google.cloud.operators.dataflow import DataflowTemplatedJobStartOperator

dataflow_template_launch_task_factory = components.create_component_from_airflow_op(
    op_class=DataflowTemplatedJobStartOperator,
    base_image="apache/airflow:2.4.3-python3.9"
)


@pipeline(name="my-pipeline", description="test-pipeline", pipeline_root=os.environ['PIPELINE_ROOT'])
def pipeline(
        dataflow_params: dict,
):
    launch_template_1 = dataflow_template_launch_task_factory(**dataflow_params)  # <-- This needs to be sent exactly like this from my awareness. 


compiler.Compiler().compile(pipeline_func=pipeline, package_path="pipeline.json")

Error:

Traceback (most recent call last):
  File "/pipelines/test.py", line 22, in <module>
    compiler.Compiler().compile(pipeline_func=pipeline, package_path="pipeline.json")
  File "/opt/anaconda3/envs/myenv/lib/python3.9/site-packages/kfp/v2/compiler/compiler.py", line 1301, in compile
    pipeline_job = self._create_pipeline_v2(
  File "/opt/anaconda3/envs/myenv/lib/python3.9/site-packages/kfp/v2/compiler/compiler.py", line 1223, in _create_pipeline_v2
    pipeline_func(*args_list)
  File "/pipelines/test.py", line 19, in pipeline
    launch_template_1 = dataflow_template_launch_task_factory(**dataflow_params)
TypeError: DataflowTemplatedJobStartOperator() argument after ** must be a mapping, not PipelineParam

I've also tried to achieve this by accessing the pipeline args in a python function component and sending the desired dictionary and then passing the extra args with **function.output but the issue is the same.

Tameem
  • 408
  • 7
  • 19

0 Answers0