I have been reading a lot about logging in to Airflow and experimenting a lot but could not achieve what I am looking for. I want to customize the logging for Airflow. I have a lot of DAGs. Each DAG has multiple tasks. EAch DAG has multiple tasks corresponding to one alert_name. DAG is running hourly and pushing logs to S3. If something goes wrong and debugging is required it is very tough to look for logs in S3. I want to customize the logs to search the log lines by RUN_ID and alert_name.
I have the following piece of code.
from airflow import DAG # noqa
from datetime import datetime
from datetime import timedelta
from airflow.operators.python_operator import PythonOperator
import y
import logging
log = logging.getLogger(__name__)
default_args = {
'owner': 'SRE',
'execution_timeout': timedelta(minutes=150)
}
dag = DAG(
dag_id = 'new_dag',
default_args = default_args,
start_date = datetime(year=2021, month=11, day=22),
schedule_interval = timedelta(days=1),
catchup = False,
max_active_runs = 3,
)
def implement_alert_logic(alert_name):
log.info(f'In the implementation for {alert_name}')
pass
def myfunc(**wargs):
for alert in ['alert_1', 'alert_2', 'alert_3']:
log.info(f'Executing logic for {alert}')
implement_alert_logic(alert)
t1 = PythonOperator(
task_id='testing_this',
python_callable = myfunc,
provide_context=True,
dag=dag)
t2 = PythonOperator(
task_id='testing_this2',
python_callable = myfunc,
provide_context=True,
dag=dag)
t1 >> t2
It prints something like
[2022-06-13, 08:16:54 UTC] {myenv.py:32} INFO - Executing logic for alert_1
[2022-06-13, 08:16:54 UTC] {myenv.py:27} INFO - In the implementation for alert_1
[2022-06-13, 08:16:54 UTC] {myenv.py:32} INFO - Executing logic for alert_2
[2022-06-13, 08:16:54 UTC] {myenv.py:27} INFO - In the implementation for alert_2
[2022-06-13, 08:16:54 UTC] {myenv.py:32} INFO - Executing logic for alert_3
[2022-06-13, 08:16:54 UTC] {myenv.py:27} INFO - In the implementation for alert_3
Actual code is much more complex and sophisticated than this. That's why I need faster and more customized debugging logs.
What I am trying to achieve is to customize the log formatter and add RUN_ID and alert_name as part of the log_message
Logs should be something like this:
[2022-06-13, 08:16:54 UTC] [manual__2022-06-13T08:16:54.103265+00:00] {myenv.py:32} INFO - Executing logic for alert_1
[2022-06-13, 08:16:54 UTC] [manual__2022-06-13T08:16:54.103265+00:00] [alert1]{myenv.py:32} INFO - In the implementation for alert_1