I have a problem when I try to execute multiple Tasks within MWAA using POST Requests. I have been using mw1.small tier of MWAA and I schedule around 3 tasks per minute with EventBridge and Lambda. When I see my results I find that some tasks are missing and when I search for logs, I noticed that the Task was triggered but It was never scheduled or queued, and it does not appear on the tree or graph view.
I have 169 rules created on Event Bridge running a certain time everyday and I only see around 165 to 166 executions of the DAG. It is not a problem from Event Bridge or Lambda. I checked the logs for those services and all 169 DAG invocations are working fine.
The lambda function that I mentioned before triggers the DAG using a POST Request for every rule that I have on Event Bridge.
These are my configuration options that I have set.
celery.pool=1
celery.worker_autoscale=1,1
core.dag_file_processor_timeout=150
core.dagbag_import_timeout=90
core.killed_task_cleanup_time=604800
core.min_serialized_dag_update_interval=60
scheduler.dag_dir_list_interval=300
scheduler.min_file_process_interval=300
scheduler.parsing_processes=1
scheduler.processor_poll_interval=60
scheduler.schedule_after_task_execution=false
NOTE: I know I can use Step Functions but this is not an option in my case.
EDIT: This problem is caused because I have multiple parallel requests made from the lambda function. Airflow 2.2.2 uses dag_id and execution_date as a primary key for the table dag_run.
The two types of traceback that I found are:
/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/api/common/experimental/trigger_dag.py:91 DeprecationWarning: Calling `DAG.create_dagrun()` without an explicit data interval is deprecated
Traceback (most recent call last):
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "dag_run_dag_id_execution_date_key"
DETAIL: Key (dag_id, execution_date)=(test_dag, 2023-02-16 20:19:55+00) already exists.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/airflow/.local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/dag_command.py", line 138, in dag_trigger
dag_id=args.dag_id, run_id=args.run_id, conf=args.conf, execution_date=args.exec_date
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/api/client/local_client.py", line 30, in trigger_dag
dag_id=dag_id, run_id=run_id, conf=conf, execution_date=execution_date
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/api/common/experimental/trigger_dag.py", line 125, in trigger_dag
replace_microseconds=replace_microseconds,
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/api/common/experimental/trigger_dag.py", line 91, in _trigger_dag
dag_hash=dag_bag.dags_hash.get(dag_id),
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/models/dag.py", line 2359, in create_dagrun
session.flush()
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2540, in flush
self._flush(objects)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2682, in _flush
transaction.rollback(_capture_exception=True)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
with_traceback=exc_tb,
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2642, in _flush
flush_context.execute()
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
rec.execute(self)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/unitofwork.py", line 589, in execute
uow,
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 245, in save_obj
insert,
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 1136, in _emit_insert_statements
statement, params
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
distilled_params,
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
e, statement, parameters, cursor, context
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
cursor, statement, parameters, context
File "/usr/local/airflow/.local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "dag_run_dag_id_execution_date_key"
DETAIL: Key (dag_id, execution_date)=(test_dag, 2023-02-16 20:19:55+00) already exists.
[SQL: INSERT INTO dag_run (dag_id, queued_at, execution_date, start_date, end_date, state, run_id, creating_job_id, external_trigger, run_type, conf, data_interval_start, data_interval_end, last_scheduling_decision, dag_hash) VALUES (%(dag_id)s, %(queued_at)s, %(execution_date)s, %(start_date)s, %(end_date)s, %(state)s, %(run_id)s, %(creating_job_id)s, %(external_trigger)s, %(run_type)s, %(conf)s, %(data_interval_start)s, %(data_interval_end)s, %(last_scheduling_decision)s, %(dag_hash)s) RETURNING dag_run.id]
[parameters: {'dag_id': 'test_dag', 'queued_at': datetime.datetime(2023, 2, 16, 20, 19, 56, 168249, tzinfo=Timezone('UTC')), 'execution_date': DateTime(2023, 2, 16, 20, 19, 55, tzinfo=Timezone('UTC')), 'start_date': None, 'end_date': None, 'state': <TaskInstanceState.QUEUED: 'queued'>, 'run_id': 'test22__2023-02-16T20:19:03+602430', 'creating_job_id': None, 'external_trigger': True, 'run_type': <DagRunType.MANUAL: 'manual'>, 'conf': <psycopg2.extensions.Binary object at 0x7fe5917cc900>, 'data_interval_start': DateTime(2023, 2, 16, 20, 19, 55, tzinfo=Timezone('UTC')), 'data_interval_end': DateTime(2023, 2, 16, 20, 19, 55, tzinfo=Timezone('UTC')), 'last_scheduling_decision': None, 'dag_hash': 'a1c4fce80be1afad038a0ccd8a41efcf'}]
(Background on this error at: http://sqlalche.me/e/13/gkpj)
and
Traceback (most recent call last):
File "/usr/local/airflow/.local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/cli/commands/dag_command.py", line 138, in dag_trigger
dag_id=args.dag_id, run_id=args.run_id, conf=args.conf, execution_date=args.exec_date
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/api/client/local_client.py", line 30, in trigger_dag
dag_id=dag_id, run_id=run_id, conf=conf, execution_date=execution_date
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/api/common/experimental/trigger_dag.py", line 125, in trigger_dag
replace_microseconds=replace_microseconds,
File "/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/api/common/experimental/trigger_dag.py", line 75, in _trigger_dag
f"A Dag Run already exists for dag id {dag_id} at {execution_date} with run id {run_id}"
airflow.exceptions.DagRunAlreadyExists: A Dag Run already exists for dag id test_dag at 2023-02-16 20:20:23+00:00 with run id test21__2023-02-16T20:20:04+061773