0

I am running into a problem where SubDagOperator having multiple tasks is failing with deadlock issue.

Problem statement : We have a task to delete > 1000 tables in Bigquery. For that purpose we are using BigQueryTableDeleteOperator Since we have > 1000 tables we have wrapped our code under subdagOperator for cleaner UI. Code:

def subdag_table(parent_dag_name, child_dag_name, args):
    # define subdag
    DAG_ID = '%s.%s' % (parent_dag_name, child_dag_name)

    dag_subdag = DAG(
        DAG_ID,
        default_args=args, catchup=False,
        max_active_runs=1, schedule_interval=None
    )

    tables_delete = some_function() # This functions brings the list of tables to be deleted

    with dag_subdag:
        list_operator = []

        for i, table in enumerate(tables_delete):
        
            #This creates as many tasks as many tables to be deleted 
            delete_table_task = bigquery_table_delete_operator.BigQueryTableDeleteOperator(
                task_id=f"delete_table_"+str(table),
                deletion_dataset_table=table 
                ignore_if_missing=True,
                queue=queue)

            list_operator += [delete_table_task]

        list_operator

    return dag_subdag


# Define Airflow DAG
with dag:

    delete_tables = SubDagOperator(
        task_id='task_name',
        subdag=subdag_table(parent_dag_name=DAG_ID,
                                         child_dag_name='child_dag_name',
                                         args=default_args
                           )
                                         
        executor=get_default_executor()
)

The above code is failing with Deadlock issue

[2021-12-06 22:20:42,753] {base_executor.py:58} INFO - Adding to queue: ['airflow', 'run',  'delete_table_<table_name>', '2021-11-05T21:36:59.091001+00:00', '--local', '--pool', 'default_pool', '-sd']
[2021-12-06 22:20:43,113] {taskinstance.py:1152} ERROR - (_mysql_exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction'
[SQL: UPDATE task_instance SET state=%s, queued_dttm=%s WHERE task_instance.task_id = %s AND task_instance.dag_id = %s AND task_instance.execution_date = %s
[parameters: ('queued', datetime.datetime(2021, 12, 6, 22, 20, 42, 746380), 'delete_table_<table_name>', datetime.datetime(2021, 11, 5, 21, 36, 59, 91001))
(Background on this error at: http://sqlalche.me/e/e3q8)
Traceback (most recent call last)
  File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1246, in _execute_contex
    cursor, statement, parameters, contex
  File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 588, in do_execut
    cursor.execute(statement, parameters
  File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/cursors.py", line 255, in execut
    self.errorhandler(self, exc, value
  File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandle
    raise errorvalu
  File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/cursors.py", line 252, in execut
    res = self._query(query
  File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/cursors.py", line 378, in _quer
    db.query(q
  File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/connections.py", line 280, in quer
    _mysql.connection.query(self, query
_mysql_exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction'

The above exception was the direct cause of the following exception

I have tried one of the ways of specifying a queue to the SubdagOperator as given in the link and also medium link but in that case the subdagoperator is not getting triggered at all rather just seen in queued stage.

Attaching the code I modified and tried :

def subdag_table(parent_dag_name, child_dag_name, args, queue=None,concurrency=None):
    # define subdag
    DAG_ID = '%s.%s' % (parent_dag_name, child_dag_name)

    dag_subdag = DAG(
        DAG_ID,
        default_args=args, catchup=False,
        max_active_runs=1, schedule_interval=None,
        concurrency=concurrency
    )

    tables_delete = some_function() # This functions brings the list of tables to be deleted

    with dag_subdag:
        list_operator = []

        for i, table in enumerate(tables_delete):
        
            #This creats as many tasks as many tables to be deleted 
            delete_table_task = bigquery_table_delete_operator.BigQueryTableDeleteOperator(
                task_id=f"delete_table_"+str(table),
                deletion_dataset_table=table 
                ignore_if_missing=True,
                queue=queue)

            list_operator += [delete_table_task]

        list_operator

    return dag_subdag

# Define Airflow DAG
with dag:
    subdag_queue = "subdag_queue"

    delete_tables = SubDagOperator(
        task_id='task_name',
        subdag=subdag_table(parent_dag_name=DAG_ID,
                                         child_dag_name='child_dag_name',
                                         args=default_args,
                                         queue=subdag_queue,
                                         concurrency=10),
                                         
        executor=get_default_executor(),
        queue=subdag_queue
    )

Can anyone suggest what should be done to fix this issue or a better implementation for the problem statement. As per the documentation taskGrouping is the better solution but it comes with Airflow 2, we are using 1.10.14 and upgrading will take some time. Therefore looking for a more feasible solution.

codninja0908
  • 497
  • 8
  • 29

1 Answers1

0

There can be many way but following two way looks good to me.

  1. Create dag to run dataflow jobs and run that dataflow jobs to delete all bigquery tables.

2.Create Two DAG , DAG1 as main dag and from there loop 100 times to trigger DAG2 and pass each time table name in context.(Use TriggerDagRunOperator to trigger DAG2).

You can refer following answer for second method.(How to use and trigger external dag multiple times.)

Run another DAG with TriggerDagRunOperator multiple times