8

We are using Airflow's KubernetesPodOperator for our data pipelines. What we would like to add is the option to pass in parameters via the UI.

We currently use it in a way that we have different yaml files that are storing the parameters for the operator, and instead of calling the operator directly we are calling a function that does some prep and returns the operator like this:

def prep_kubernetes_pod_operator(yaml):

    # ... read yaml and extract params

    return KubernetesPodOperator(params)

with DAG(...):
    
    task1 = prep_kubernetes_pod_operator(yaml)

For us this works well and we can keep our dag files pretty lightweight, however now we would like to add the functionality that we can add some extra params via the UI. I understand that the trigger params can be accessed via kwargs['dag_run'].conf, but I had no success pulling these into the Python function.

Another thing I tried is to create a custom operator because that recognises the args, but I couldn't manage to call the KubernetesPodOperator in the execute part (and I guess calling an operator in an operator is not right solution anyways).

Update:

Following NicoE's advice, I started to extend the KubernetesPodOperator instead.

The error I am having now is that when I am parsing the yaml and assign the arguments after, the parent arguments become tuples and that throws a type error.

dag:

task = NewKPO(
    task_id="task1",
    yaml_path=yaml_path)

operator:

class NewKPO(KubernetesPodOperator):
   @apply_defaults
   def __init__(
           self,
           yaml_path: str,
           name: str = "default",
           *args,
           **kwargs) -> None:
       self.yaml_path = yaml_path
       self.name = name
       super(NewKPO, self).__init__(
           name=name, # DAG is not parsed without this line - 'key has to be string'
           *args,
           **kwargs)

   def execute(self, context):
       # parsing yaml and adding context["dag_run"].conf (...)
       self.name = yaml.name
       self.image = yaml.image
       self.secrets = yaml.secrets
       #(...) if i run a type(self.secrets) here I will get tuple
       return super(NewKPO, self).execute(context)
a54i
  • 95
  • 1
  • 5

1 Answers1

21

You could use params, which is a dictionary that can be defined at DAG level parameters and remains accesible in every task. Works for every operator derived from BaseOperator and can also be set from the UI.

The following example shows how to use it with different operators. params could be defined in default_args dict or as arg to the DAG object.

default_args = {
    "owner": "airflow",
    'params': {
        "param1": "first_param",
        "param2": "second_param"
    }
}

dag = DAG(
    dag_id="example_dag_params",
    default_args=default_args,
    start_date=days_ago(1),
    schedule_interval="@once",
    tags=['example_dags'],
    catchup=False
)

When triggering this DAG from the UI you could add an extra param:

set params while triggering DAG from the UI

Params could be accessed in templated fields, as in BashOperator case:

with dag:

    bash_task = BashOperator(
        task_id='bash_task',
        bash_command='echo bash_task: {{ params.param1 }}')

bash_task logs output:

{bash.py:158} INFO - Running command: echo bash_task: first_param
{bash.py:169} INFO - Output:
{bash.py:173} INFO - bash_task: first_param
{bash.py:177} INFO - Command exited with return code 0

Params are accessible within execution context, like in python_callable:


    def _print_params(**kwargs):
        print(f"Task_id: {kwargs['ti'].task_id}")
        for k, v in kwargs['params'].items():
            print(f"{k}:{v}")

    python_task = PythonOperator(
        task_id='python_task',
        python_callable=_print_params,
    )

Output:

{logging_mixin.py:104} INFO - Task_id: python_task
{logging_mixin.py:104} INFO - param1:first_param
{logging_mixin.py:104} INFO - param2:second_param
{logging_mixin.py:104} INFO - param3:param_from_the_UI

You could also add params at task level definition:

    python_task_2 = PythonOperator(
        task_id='python_task_2',
        python_callable=_print_params,
        params={'param4': 'param defined at task level'}
    )

Output:

{logging_mixin.py:104} INFO - Task_id: python_task_2
{logging_mixin.py:104} INFO - param1:first_param
{logging_mixin.py:104} INFO - param2:second_param
{logging_mixin.py:104} INFO - param4:param defined at task level
{logging_mixin.py:104} INFO - param3:param_from_the_UI

Following the example you could define a custom Operator that inherits from BaseOperator:

class CustomDummyOperator(BaseOperator):

    @apply_defaults
    def __init__(self, custom_arg: str = 'default', *args, **kwargs) -> None:
        self.custom_arg = custom_arg
        super(CustomDummyOperator, self).__init__(*args, **kwargs)

    def execute(self, context):
        print(f"Task_id: {self.task_id}")
        for k, v in context['params'].items():
            print(f"{k}:{v}")

An example task would be:

    custom_op_task = CustomDummyOperator(
        task_id='custom_operator_task'
    )

Output:

{logging_mixin.py:104} INFO - Task_id: custom_operator_task
{logging_mixin.py:104} INFO - custom_arg: default
{logging_mixin.py:104} INFO - param1:first_param
{logging_mixin.py:104} INFO - param2:second_param
{logging_mixin.py:104} INFO - param3:param_from_the_UI

Imports:

from airflow import DAG
from airflow.models.baseoperator import chain
from airflow.models import BaseOperator
from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator
from airflow.utils.dates import days_ago
from airflow.utils.decorators import apply_defaults

I hope that works for you!

NicoE
  • 4,373
  • 3
  • 18
  • 33
  • 1
    Hey Nico, I really appreciate the thorough answer. While what you are saying makes complete sense, this is not exactly what I am looking for. I would use the UI to add the extra parameters but I would want that Python function (prep_kubernetes_pod_operator) I wrote as an example to pick them up. So that wouldn't be a callable for the PythonOperator because ultimately I will be running the KubernetesPodOperator. I hope I managed to explain it properly this time. Would there be a solution for this usage? Many thanks. – a54i Jun 24 '21 at 20:10
  • 1
    @a54i Ok, now I get it! sorry. I don't think there is a way to access `params` or `conf` from an "arbitrary" function as it happens in the example you provided. So I think of two options, use the [template fields](https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/_api/airflow/providers/cncf/kubernetes/operators/kubernetes_pod/index.html) of the `KubernetesPodOperator` which are: `'image', 'cmds', 'arguments', 'env_vars', 'labels', 'config_file', 'pod_template_file'` with jinja templating, in the same way as shown in the BashOperator example above. – NicoE Jun 24 '21 at 21:48
  • The other approach, if you need to access those params, maybe process them, and pass them as args to the `KubernetesPodOperator`, but in other than then `template_fields`, then you could consider creating your a custom operator extending KubernetesPodOperator. This will allow you to do pretty much whatever you need and create the tasks directly by instantiating these custom operator, removing the function. Let me know if that worked for you! – NicoE Jun 24 '21 at 21:54
  • Thanks Nico, yes I believe the best solution would be to create a custom operator. So would I just create a new operator that inherits from the KubernetesPodOperator and then I can change some params like self.image etc., would the parent operator execute after that? Sorry I am not too familiar with these. – a54i Jun 25 '21 at 12:42
  • @a54i Try with [this guide](https://airflow.apache.org/docs/apache-airflow/stable/howto/custom-operator.html) on how-to create a custom operator. You will extend `KubernetesPodOperator` probably adding more arguments to __init__ and then in the `execute` method, adding the new behaviour you want before calling your base class `execute` with something like: `return super(MyCustomOp, self).execute()` . Think of this new operator as a wrapper. Within execute() you have access to `context` parameter, as shown in the example above. – NicoE Jun 25 '21 at 15:36
  • Hey @NicoE, yes I was using that guide and thanks for clarifying the return part. I only have one issue to sort. So I want to run my operator as `CustomOp(task_id, yaml)` with the plan to extract the yaml inside and populate the arguments. However, the base `KubernetesPodOperator` needs a `name` argument when initiated otherwise the `_set_name(name)` validation fails. I am trying to start with a default string value with `super().__init__(name="default", **kwargs) and later do the self.name = "pod_name" but that doesn't seem to work as self.name becomes a tuple. How could I overcome this? – a54i Jun 28 '21 at 15:01
  • Hey @a54i I just edited to the answer to add a default arg to the `CustomDummyOperator class` constructor. You could do the same, which is basically just initializing your `CustomOp` with the `name` arg that the base class is expecting, that should work. If these worked for you, please do consider to mark the answer as [accepted](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work/5235#5235) – NicoE Jun 29 '21 at 21:22
  • thanks again for your answer, I updated the post to clarify the problem. In the meantime I will mark the solution correct but I would appreciate if you could point out what I am doing wrong. Many thanks! – a54i Jun 29 '21 at 22:00
  • You don't need to re-write every arg in your `CustomOp`, just call it as you would done it with the base op, passing the same params (i.e `task_id`, `image`, `name`, etc.). The problem with `secrets` is not related with the fact that you are extending the operator. `secrets` param is expected to be : `list[airflow.kubernetes.secret.Secret]` . You could create the secret objects first, check [this example](https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/_modules/airflow/providers/cncf/kubernetes/example_dags/example_kubernetes.html) – NicoE Jun 29 '21 at 23:56
  • @a54i Even if my last comment dosn't solve your issue, I highly recommend you to create a new question, focused in this new issues you are finding, that way it will be easier for anyone (not just me) to provide any help and different answers. – NicoE Jun 30 '21 at 00:00
  • 1
    the problem is not specific to the secrets, all arguments that I assign are becoming tuples. I will open a new question as you recommended, many thanks again Nico! – a54i Jun 30 '21 at 08:21
  • @NicoE, in the `CustomDummyOperator`, when you define the execute function, did you have to import context from something like `airflow.operators.python` or context is available within the base operator. – Kay Jul 22 '21 at 15:14
  • @Kay `context` is a parameter of the `execute` method from `BaseOperator`. [excute](https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/models/baseoperator/index.html#airflow.models.baseoperator.BaseOperator.execute) is the main method to derive from, when creating your custom operator. – NicoE Jul 22 '21 at 18:05