We are using Airflow's KubernetesPodOperator
for our data pipelines. What we would like to add is the option to pass in parameters via the UI.
We currently use it in a way that we have different yaml files that are storing the parameters for the operator, and instead of calling the operator directly we are calling a function that does some prep and returns the operator like this:
def prep_kubernetes_pod_operator(yaml):
# ... read yaml and extract params
return KubernetesPodOperator(params)
with DAG(...):
task1 = prep_kubernetes_pod_operator(yaml)
For us this works well and we can keep our dag files pretty lightweight, however now we would like to add the functionality that we can add some extra params via the UI. I understand that the trigger params can be accessed via kwargs['dag_run'].conf
, but I had no success pulling these into the Python function.
Another thing I tried is to create a custom operator because that recognises the args, but I couldn't manage to call the KubernetesPodOperator
in the execute part (and I guess calling an operator in an operator is not right solution anyways).
Update:
Following NicoE's advice, I started to extend the KubernetesPodOperator
instead.
The error I am having now is that when I am parsing the yaml and assign the arguments after, the parent arguments become tuples and that throws a type error.
dag:
task = NewKPO(
task_id="task1",
yaml_path=yaml_path)
operator:
class NewKPO(KubernetesPodOperator):
@apply_defaults
def __init__(
self,
yaml_path: str,
name: str = "default",
*args,
**kwargs) -> None:
self.yaml_path = yaml_path
self.name = name
super(NewKPO, self).__init__(
name=name, # DAG is not parsed without this line - 'key has to be string'
*args,
**kwargs)
def execute(self, context):
# parsing yaml and adding context["dag_run"].conf (...)
self.name = yaml.name
self.image = yaml.image
self.secrets = yaml.secrets
#(...) if i run a type(self.secrets) here I will get tuple
return super(NewKPO, self).execute(context)