0

I'm working with Airflow 2.1.4 and looking to find the status of the prior task run (Task Run, not Task Instance and not Dag Run).

I.e., DAG MorningWorkflow runs a 9:00am, and task ConditionalTask is in that dag. There is some precondition logic that will throw an AirflowSkipException in a number of situations (including timeframe of day and other context-specific information to reduce the likelihood of collisions with independent processes)

If ConditionalTask fails, we can fix the issue, clear the failed run, and re-run it without running the entire DAG. However, the skip logic reruns and will often now skip it, even though the original conditions were non-skipping.

So, I want to update the precondition logic to never skip if this taskinstance ran previously and failed. I can determine if the taskinstance ran previously using TaskInstance.try_number orTaskInstance.prev_attempted_tries, but this doesn't tell me whether it actually tried to run originally or if it skipped (i.e., if we clear the entire DagRun to rerun the whole workflow, we would want it to still skip).

An alternative would be to determine whether the first attempted run was skipped or not.

Kevin Crouse
  • 43
  • 1
  • 7

1 Answers1

2

@Kevin Crouse

In order to answer your question, we can take advantage of from airflow.models import DagRun

To provide you with a complete, answer I have created two functions to assist you in resolving similar quandaries in the future.

How to return the overall state/success of a specific dag_id passed as a function arg?

def get_last_dag_run_status(dag_id):
    """ Returns the status of the last dag run for the given dag_id 
    1. Utilise the find method of DagRun class
    2. Step 1 returns a list, so we sort it by the last execution date
    3. I have returned 2 examples for you to see a) the state, b) the last execution date, you can explore this further by just returning last_dag_run[0]
    
    Args:
        dag_id (str): The dag_id to check
    Returns:
        List - The status of the last dag run for the given dag_id
        List - The last execution date of the dag run for the given dag_id
    """
    last_dag_run = DagRun.find(dag_id=dag_id)
    last_dag_run.sort(key=lambda x: x.execution_date, reverse=True)
    return [last_dag_run[0].state, last_dag_run[0].execution_date]

How to return the status of a specific task_id, within a specific dag_id?

def get_task_status(dag_id, task_id):
    """ Returns the status of the last dag run for the given dag_id
    1. The code is very similar to the above function, I use it as the foundation for many similar problems/solutions
    2. The key difference is that in the return statement, we can directly access the .get_task_instance passing our desired task_id and its state


    Args:
        dag_id (str): The dag_id to check
        task_id (str): The task_id to check
    Returns:
        List - The status of the last dag run for the given dag_id
    """
    last_dag_run = DagRun.find(dag_id=dag_id)
    last_dag_run.sort(key=lambda x: x.execution_date, reverse=True)
    return last_dag_run[0].get_task_instance(task_id).state

I hope this helps you in your journey to resolve your issues.

For posterity, here is a complete dummy Dag to demonstrate the 2 functions working.

from airflow import DAG
from airflow.operators.dummy import DummyOperator
from airflow.operators.python import PythonOperator
from airflow.models import DagRun
from datetime import datetime

def get_last_dag_run_status(dag_id):
    """ Returns the status of the last dag run for the given dag_id 
    
    Args:
        dag_id (str): The dag_id to check
    Returns:
        List - The status of the last dag run for the given dag_id
        List - The last execution date of the dag run for the given dag_id
    """
    last_dag_run = DagRun.find(dag_id=dag_id)
    last_dag_run.sort(key=lambda x: x.execution_date, reverse=True)
    return [last_dag_run[0].state, last_dag_run[0].execution_date]

def get_task_status(dag_id, task_id):
    """ Returns the status of the last dag run for the given dag_id

    Args:
        dag_id (str): The dag_id to check
        task_id (str): The task_id to check
    Returns:
        List - The status of the last dag run for the given dag_id
    """
    last_dag_run = DagRun.find(dag_id=dag_id)
    last_dag_run.sort(key=lambda x: x.execution_date, reverse=True)
    return last_dag_run[0].get_task_instance(task_id).state

with DAG(
  'stack_overflow_ans_1',
  tags = ['SO'],
  start_date = datetime(2022, 1, 1),
  schedule_interval = None,
  catchup = False,
  is_paused_upon_creation = False
) as dag:

  t1 = DummyOperator(
    task_id = 'start'
  )

  t2 = PythonOperator(
    task_id = 'get_last_dag_run_status',
    python_callable = get_last_dag_run_status,
    op_args = ['YOUR_DAG_NAME'],
    do_xcom_push = False
  )

  t3 = PythonOperator(
    task_id = 'get_task_status',
    python_callable = get_task_status,
    op_args = ['YOUR_DAG_NAME', 'YOUR_DAG_TASK_WITHIN_THE_DAG'],
    do_xcom_push = False
  )

  t4 = DummyOperator(
    task_id = 'end'
  )

  t1 >> t2 >> t3 >> t4
dimButTries
  • 661
  • 7
  • 15
  • I am getting `return last_dag_run[0].get_task_instance(task_id).state AttributeError: 'NoneType' object has no attribute 'state'` – Gaurang Shah Oct 20 '22 at 16:40