I am running into a problem where SubDagOperator having multiple tasks is failing with deadlock issue.
Problem statement : We have a task to delete > 1000 tables in Bigquery. For that purpose we are using BigQueryTableDeleteOperator
Since we have > 1000 tables we have wrapped our code under subdagOperator for cleaner UI.
Code:
def subdag_table(parent_dag_name, child_dag_name, args):
# define subdag
DAG_ID = '%s.%s' % (parent_dag_name, child_dag_name)
dag_subdag = DAG(
DAG_ID,
default_args=args, catchup=False,
max_active_runs=1, schedule_interval=None
)
tables_delete = some_function() # This functions brings the list of tables to be deleted
with dag_subdag:
list_operator = []
for i, table in enumerate(tables_delete):
#This creates as many tasks as many tables to be deleted
delete_table_task = bigquery_table_delete_operator.BigQueryTableDeleteOperator(
task_id=f"delete_table_"+str(table),
deletion_dataset_table=table
ignore_if_missing=True,
queue=queue)
list_operator += [delete_table_task]
list_operator
return dag_subdag
# Define Airflow DAG
with dag:
delete_tables = SubDagOperator(
task_id='task_name',
subdag=subdag_table(parent_dag_name=DAG_ID,
child_dag_name='child_dag_name',
args=default_args
)
executor=get_default_executor()
)
The above code is failing with Deadlock
issue
[2021-12-06 22:20:42,753] {base_executor.py:58} INFO - Adding to queue: ['airflow', 'run', 'delete_table_<table_name>', '2021-11-05T21:36:59.091001+00:00', '--local', '--pool', 'default_pool', '-sd']
[2021-12-06 22:20:43,113] {taskinstance.py:1152} ERROR - (_mysql_exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction'
[SQL: UPDATE task_instance SET state=%s, queued_dttm=%s WHERE task_instance.task_id = %s AND task_instance.dag_id = %s AND task_instance.execution_date = %s
[parameters: ('queued', datetime.datetime(2021, 12, 6, 22, 20, 42, 746380), 'delete_table_<table_name>', datetime.datetime(2021, 11, 5, 21, 36, 59, 91001))
(Background on this error at: http://sqlalche.me/e/e3q8)
Traceback (most recent call last)
File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1246, in _execute_contex
cursor, statement, parameters, contex
File "/opt/python3.6/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 588, in do_execut
cursor.execute(statement, parameters
File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/cursors.py", line 255, in execut
self.errorhandler(self, exc, value
File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandle
raise errorvalu
File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/cursors.py", line 252, in execut
res = self._query(query
File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/cursors.py", line 378, in _quer
db.query(q
File "/opt/python3.6/lib/python3.6/site-packages/MySQLdb/connections.py", line 280, in quer
_mysql.connection.query(self, query
_mysql_exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction'
The above exception was the direct cause of the following exception
I have tried one of the ways of specifying a queue to the SubdagOperator as given in the link and also medium link but in that case the subdagoperator is not getting triggered at all rather just seen in queued stage.
Attaching the code I modified and tried :
def subdag_table(parent_dag_name, child_dag_name, args, queue=None,concurrency=None):
# define subdag
DAG_ID = '%s.%s' % (parent_dag_name, child_dag_name)
dag_subdag = DAG(
DAG_ID,
default_args=args, catchup=False,
max_active_runs=1, schedule_interval=None,
concurrency=concurrency
)
tables_delete = some_function() # This functions brings the list of tables to be deleted
with dag_subdag:
list_operator = []
for i, table in enumerate(tables_delete):
#This creats as many tasks as many tables to be deleted
delete_table_task = bigquery_table_delete_operator.BigQueryTableDeleteOperator(
task_id=f"delete_table_"+str(table),
deletion_dataset_table=table
ignore_if_missing=True,
queue=queue)
list_operator += [delete_table_task]
list_operator
return dag_subdag
# Define Airflow DAG
with dag:
subdag_queue = "subdag_queue"
delete_tables = SubDagOperator(
task_id='task_name',
subdag=subdag_table(parent_dag_name=DAG_ID,
child_dag_name='child_dag_name',
args=default_args,
queue=subdag_queue,
concurrency=10),
executor=get_default_executor(),
queue=subdag_queue
)
Can anyone suggest what should be done to fix this issue or a better implementation for the problem statement. As per the documentation taskGrouping is the better solution but it comes with Airflow 2, we are using 1.10.14 and upgrading will take some time. Therefore looking for a more feasible solution.