1

I have been reading a lot about logging in to Airflow and experimenting a lot but could not achieve what I am looking for. I want to customize the logging for Airflow. I have a lot of DAGs. Each DAG has multiple tasks. EAch DAG has multiple tasks corresponding to one alert_name. DAG is running hourly and pushing logs to S3. If something goes wrong and debugging is required it is very tough to look for logs in S3. I want to customize the logs to search the log lines by RUN_ID and alert_name.

I have the following piece of code.

from airflow import DAG  # noqa
from datetime import datetime
from datetime import timedelta
from airflow.operators.python_operator import PythonOperator
import y

import logging

log = logging.getLogger(__name__)

default_args = {
    'owner': 'SRE',
    'execution_timeout': timedelta(minutes=150)
}


dag = DAG(
    dag_id = 'new_dag',
    default_args = default_args,
    start_date = datetime(year=2021, month=11, day=22),
    schedule_interval = timedelta(days=1),
    catchup = False,
    max_active_runs = 3,
)

def implement_alert_logic(alert_name):
    log.info(f'In the implementation for {alert_name}')
    pass

def myfunc(**wargs):
    for alert in ['alert_1', 'alert_2', 'alert_3']:
        log.info(f'Executing logic for {alert}')
        implement_alert_logic(alert)



t1 = PythonOperator(
    task_id='testing_this',
    python_callable = myfunc,
    provide_context=True,
    dag=dag)

t2 = PythonOperator(
    task_id='testing_this2',
    python_callable = myfunc,
    provide_context=True,
    dag=dag)

t1 >> t2

It prints something like

[2022-06-13, 08:16:54 UTC] {myenv.py:32} INFO - Executing logic for alert_1
[2022-06-13, 08:16:54 UTC] {myenv.py:27} INFO - In the implementation for alert_1
[2022-06-13, 08:16:54 UTC] {myenv.py:32} INFO - Executing logic for alert_2
[2022-06-13, 08:16:54 UTC] {myenv.py:27} INFO - In the implementation for alert_2
[2022-06-13, 08:16:54 UTC] {myenv.py:32} INFO - Executing logic for alert_3
[2022-06-13, 08:16:54 UTC] {myenv.py:27} INFO - In the implementation for alert_3

Actual code is much more complex and sophisticated than this. That's why I need faster and more customized debugging logs.

What I am trying to achieve is to customize the log formatter and add RUN_ID and alert_name as part of the log_message

Logs should be something like this:

[2022-06-13, 08:16:54 UTC] [manual__2022-06-13T08:16:54.103265+00:00] {myenv.py:32} INFO - Executing logic for alert_1
[2022-06-13, 08:16:54 UTC] [manual__2022-06-13T08:16:54.103265+00:00] [alert1]{myenv.py:32} INFO - In the implementation for alert_1
Abhishek Soni
  • 289
  • 2
  • 4
  • 15

1 Answers1

0

You are already sending the context to the callable, just make use of it

def myfunc(**wargs):
    for alert in ['alert_1', 'alert_2', 'alert_3']:
        log.info(f"[{wargs['run_id']}] [{alert}] Executing logic for {alert}")
        implement_alert_logic(alert)

Then just send the whole context or just the run_id also to implement_alert_logic function.

Ivan
  • 13
  • 3
  • Your codes cannot generate the expected outcome as mentioned in the question, where **customize the log formatter** is not achieved. **RUN_ID and alert_name as part of the log_message** is not achieved as you logged the `RUN_ID` in a new line of log instead of the *part of the log_message* as shown in the expected outcome codes. – ytung-dev Jun 14 '22 at 02:24
  • Unfortunately, the problem with this approach, if I have 100 log lines in the implementation of Python Operator callable, I will have to add `kwargs['run-id']` in every log message. I want to include the RUN_ID and alert_name in log format itself. – Abhishek Soni Jun 15 '22 at 15:34