1

Is there a simple, efficient mechanism to dynamically set the run_as_user parameter for an Airflow 2.0 task instance based upon the user that triggered the DAG execution?

This answer provides a way to determine the triggering user by inspecting the metadata database (earlier answers that involve templating--such as this--no longer seem to work for Airflow 2.0). Using the metadata inspection approach, I've been able to dynamically set run_as_user via a customer operator, like so:

from airflow.operators.python import PythonOperator
from airflow.models.log import Log
from airflow.utils.db import create_session

class CustomPythonOperator(PythonOperator):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
 
        with create_session() as session:
            results = session.query(Log.owner)\
                .filter(
                    Log.dag_id==dag.dag_id,
                    Log.event=='trigger')\
                .order_by(Log.dttm.desc())\
                .all()
            
        self.run_as_user = results[0][0] if len(results) else 'airflow'

That said, this approach seems problematic for at least two reasons:

  1. Ostensibly, it requires that the scheduler query the metadata database every time it instantiates the custom operator (rather than every time the task is executed). The documentation warns against expensive operations in the __init__ method of a custom operator. Moving the user lookup to the execute method of the custom operator does not work.
  2. I suspect the semantics of this solution are not quite right: Specifically, I suspect that this approach will set run_as_user to the user that most recently triggered the DAG as of the most recent time that the scheduler parsed the DAG, rather than the user associated with the current DAG execution. Depending on the scheduler interval, they may be the same user, but this would seem to introduce a race condition.

Is there a better way to solve this?

Matthew Nizol
  • 2,609
  • 1
  • 18
  • 22

0 Answers0