Use case:
Use Airflow operators to launch DataflowTemplate from KFP pipelines but the pipeline should be parameterised.
Issue:
I cant find a way in which I can pass the KFP pipeline args to the component function as extra args. i.e.,
import os
from kfp.v2.compiler import compiler
from kfp.v2.dsl import pipeline
from kfp import components
from airflow.providers.google.cloud.operators.dataflow import DataflowTemplatedJobStartOperator
dataflow_template_launch_task_factory = components.create_component_from_airflow_op(
op_class=DataflowTemplatedJobStartOperator,
base_image="apache/airflow:2.4.3-python3.9"
)
@pipeline(name="my-pipeline", description="test-pipeline", pipeline_root=os.environ['PIPELINE_ROOT'])
def pipeline(
dataflow_params: dict,
):
launch_template_1 = dataflow_template_launch_task_factory(**dataflow_params) # <-- This needs to be sent exactly like this from my awareness.
compiler.Compiler().compile(pipeline_func=pipeline, package_path="pipeline.json")
Error:
Traceback (most recent call last):
File "/pipelines/test.py", line 22, in <module>
compiler.Compiler().compile(pipeline_func=pipeline, package_path="pipeline.json")
File "/opt/anaconda3/envs/myenv/lib/python3.9/site-packages/kfp/v2/compiler/compiler.py", line 1301, in compile
pipeline_job = self._create_pipeline_v2(
File "/opt/anaconda3/envs/myenv/lib/python3.9/site-packages/kfp/v2/compiler/compiler.py", line 1223, in _create_pipeline_v2
pipeline_func(*args_list)
File "/pipelines/test.py", line 19, in pipeline
launch_template_1 = dataflow_template_launch_task_factory(**dataflow_params)
TypeError: DataflowTemplatedJobStartOperator() argument after ** must be a mapping, not PipelineParam
I've also tried to achieve this by accessing the pipeline args in a python function component and sending the desired dictionary and then passing the extra args with **function.output
but the issue is the same.