4

I have a process that uses Airflow to execute docker containers on AWS fargate. The docker containers are just running ETL's written in Python. In some of my python scripts I want to allow team members to pass commands and think dag_run.conf will be a good way to accomplish this. I was wondering if there was a way to append the values from dag_run.conf to the command key in the ecsoperator's override clause. My overrides clause looks something like this:

                "containerOverrides": [
                    {
                        "name": container_name,
                        "command": c.split(" ")
                    },
                ],```
tm_madison
  • 43
  • 3

2 Answers2

0

Pass in a JSON to dag_run.conf with a key overrides >> which will be passed into EcsOperator >> which in turn will be passed to the underlying boto3 client (during run_task operation).

To override container commands, add the key containerOverrides (to the overrides dict) whose value is a list of dictionaries. Note: you must reference the specific container name.

An example input:

{
    "overrides": {
        "containerOverrides": [
            {
                "name": "my-container-name",
                "command": ["echo", "hello world"]
            }
        ]
    }
}

Notes:

  • Be sure to reference the exact container name
  • Command should be a list of strings.
Yaakov Bressler
  • 9,056
  • 2
  • 45
  • 69
0

I had a very similar problem and here's what I found:

  • You cannot pass a command as string and then do .split(" "). This is due to the fact that Airflow templating does not happen when the DAG is parsed. Instead, the literal {{ dag_run.conf['command']}} (or, in my formulation, {{ params.my_command }}) is passed to the EcsOperator and only evaluated just before the task is run. So we need to keep the definition (yes, as string) "{{ params.my_command }}" in the code and pass it through.
  • By default, all parameters for a DAG as passed as string types, but they don't have to! After playing around with jsonschema a bit, I found that you can express "list of strings" as a parameter type like this: Param(type="array", items={"type": "string"}).
  • The above only ensures that the input can be a list of strings, but you also need to receive it as a list of strings. That functionality is simply switched on by setting render_template_as_native_obj=True.

All put together, you get something like this for your DAG:

@dag(
    default_args={"owner": "airflow"},
    start_date=days_ago(2),
    schedule_interval=None,
    params={"my_command": Param(type="array", items={"type": "string"}, default=[])},
    render_template_as_native_obj=True,
)
def my_command():
    """run a command manually"""
    EcsOperator(
        task_id="my_command", 
        overrides={
            "containerOverrides": [
                {"name": "my-container-name", "command": command}
            ]
        },
        command="{{ params.my_command }}",
        ...
    )


dag = my_command()
Elias Mi
  • 611
  • 6
  • 14