34

I am trying to pass the following configuration parameters to Airflow CLI while triggering a dag run. Following is the trigger_dag command I am using.

airflow trigger_dag  -c '{"account_list":"[1,2,3,4,5]", "start_date":"2016-04-25"}'  insights_assembly_9900 

My problem is that how can I access the con parameters passed inside an operator in the dag run.

Billal Begueradj
  • 20,717
  • 43
  • 112
  • 130
devj
  • 1,123
  • 2
  • 11
  • 24

4 Answers4

49

This is probably a continuation of the answer provided by devj.

  1. At airflow.cfg the following property should be set to true: dag_run_conf_overrides_params=True

  2. While defining the PythonOperator, pass the following argument provide_context=True. For example:

get_row_count_operator = PythonOperator(task_id='get_row_count', python_callable=do_work, dag=dag, provide_context=True)
  1. Define the python callable (Note the use of **kwargs):
def do_work(**kwargs):    
    table_name = kwargs['dag_run'].conf.get('table_name')    
    # Rest of the code
  1. Invoke the dag from command line:
airflow trigger_dag read_hive --conf '{"table_name":"my_table_name"}'

I have found this discussion to be helpful.

Arnab Biswas
  • 4,495
  • 3
  • 42
  • 60
23

There are two ways in which one can access the params passed in airflow trigger_dag command.

  1. In the callable method defined in PythonOperator, one can access the params as kwargs['dag_run'].conf.get('account_list')

  2. given the field where you are using this thing is templatable field, one can use {{ dag_run.conf['account_list'] }}

The schedule_interval for the externally trigger-able DAG is set as None for the above approaches to work

devj
  • 1,123
  • 2
  • 11
  • 24
  • 21
    is there a way to access `dag_run` from within a `with DAG() as dag:` block? i would like to parse `params` values into tasks based on say whether a `conf` key exists and if not, take a `default_arg` value instead (rather than putting too much logic into the jinja template itself). – yee379 Mar 19 '18 at 09:20
  • 3
    maybe ```conf = dag.get_dagrun(execution_date=dag.latest_execution_date).conf``` – fpopic Oct 20 '19 at 14:44
  • 1
    @yee379 No; in a Dag file you are defining the DAG but the Run only exists after it is scheduled or triggered. Like a task is defined but a task instance only exists during a run. Do not change the dag structure dynamically per run. Instead make tasks that either no-op or skip themselves based on conditions. – dlamblin Apr 30 '20 at 00:20
  • @dlamblin, Wait, why wouldn't we want to be able to access per run configurations from within the DAG context? By pushing the context down to the operators this means if someone wants to use a plugin model (say, have one configuration file per customer) for the same DAG this means you have to hide the constants in the operators instead of exposing them in the DAG definition. Fundamentally they both work but it removes the ability to be explicit if you want to be. – AlexLordThorsen Jan 25 '22 at 22:29
  • @yee379 were you able to solve this? Even I want to access the conf passed form the cmd/UI within the DAG – Akshay Feb 25 '22 at 09:44
11

In the case you are trying to access the Airflow system-wide config (instead of a DAG config), the following might help:

Firstly, import this

from airflow.configuration import conf

Secondly, get the value somewhere

conf.get("core", "my_key")

Possible, set a value with

conf.set("core", "my_key", "my_val")
Łukasz Kwieciński
  • 14,992
  • 4
  • 21
  • 39
zahir hamroune
  • 191
  • 1
  • 2
0

For my use case, I had to pass arguments to the airflow workflow(or task) using the API. My workflow is as follows: Lambda is triggered when a new file lands in the S3 bucket, the Lambda in turn triggered an airflow DAG and passed the bucket name and the key of the file.

Here's my solution:

s3 = boto3.client('s3')
mwaa = boto3.client('mwaa')

def lambda_handler(event, context):
    # print("Received event: " + json.dumps(event, indent=2))

    # Get the object from the event and show its content type
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    
    mwaa_cli_token = mwaa.create_cli_token(
        Name=mwaa_env_name
    )
    
    mwaa_auth_token = 'Bearer ' + mwaa_cli_token['CliToken']
    mwaa_webserver_hostname = 'https://{0}/aws_mwaa/cli'.format(mwaa_cli_token['WebServerHostname'])
    
    conf = {'bucket': bucket, 'key': key}
    raw_data = """{0} {1} --conf '{2}'""".format(mwaa_cli_command, dag_name, json.dumps(conf))
    
    # pass the key and bucket name to airflow to initiate the workflow
    requests.post(
            mwaa_webserver_hostname,
            headers={
                'Authorization': mwaa_auth_token,
                'Content-Type': 'text/plain'
                },
            data=raw_data
            )
atosh502
  • 76
  • 5
  • 14