0

We are observing Airflow is sending large amount of logs to Datadog and we want to restrict/Reduce these logs by excluding logs from below tasks:

  • pod_manager.py
  • base.py
  • base_aws.py
  • logging_mixin.py

Do we have any configuration settings where I can define this requirement?

We have Airflow-2.0 running on Kubernetes.

  • To send only a specific subset of logs to Datadog, use the log_processing_rules parameter in your configuration file with the exclude_at_match or include_at_match type. Refer to this [doc](https://docs.datadoghq.com/agent/logs/advanced_log_collection/?tab=configurationfile#exclude-at-match) for more information and let me know if this helps. – Hemanth Kumar Feb 13 '23 at 06:12
  • Thanks @HemanthKumar. I am trying to implement it but I am not understanding which component of airflow generates above logs as i only can see yaml file for Schedular and Webserver. – suresh choudhary Feb 13 '23 at 07:01
  • Can you try as below to exclude logs from the tasks named pod_manager.py, base.py, base_aws.py, drgn_kubernetes_pod_operator.py, logging_mixin.py, and standard_task_runner.py . Can you use the following configuration settings: [logging] `exclude_tasks=pod_manager.py,base.py,base_aws.py,drgn_kubernetes_pod_operator.py,logging_mixin.py,standard_task_runner.py` These settings can be added to the `airflow.cfg` file, which is located in the Airflow home directory. Refer to this [SO](https://stackoverflow.com/a/70957601/19230181) – Hemanth Kumar Feb 13 '23 at 07:07
  • @HemanthKumar I tested above method in my local and its not working. I can see the logs of task after excluding it using above mentioned method. Please let me know if anything else i can try. – suresh choudhary Feb 13 '23 at 09:01
  • Have you tried using the solution provided in the first comment. Try this configuaration file in airflow.cfg and let me know the error you are getting? – Hemanth Kumar Feb 13 '23 at 09:40
  • @HemanthKumar yes 1st solution will work but I don't understand which component of airflow is generating above mentioned logs. Will all the logs be generated by the schedular? – suresh choudhary Feb 16 '23 at 01:58
  • I think yes can you have a look at this schedular and try with the first solution. – Hemanth Kumar Feb 16 '23 at 08:34
  • @HemanthKumarI tested and the first solution is working fine. I am able to exclude all the logs using the.* prefix. I am struggling to make regex to exclude above all. I tried regex like "^.*(pod_manager| base|base_aws|logging_mixin).*$", This will remove only the name of the task but except for these words, the entire line will be present in the logs. Is this regex looks fine? am I doing any mistake? – suresh choudhary Feb 20 '23 at 07:29
  • Seems to be your regex is also fine and can you try it once. If its not working then you can use a negative lookahead assertion as below : `"^.*(?!pod_manager|base|base_aws|logging_mixin).*$"`. Refer to this [A regular expression to exclude a word/string](https://stackoverflow.com/questions/2078915/) and [How to exclude a specific string constant? for more information](https://stackoverflow.com/questions/1395177/). – Hemanth Kumar Feb 20 '23 at 09:27
  • 1
    No, these regex patterns are also not working. When I am doing exclude_all or include_all then it is working but with any kinda pattern its not working. – suresh choudhary Feb 20 '23 at 09:41

1 Answers1

0

To send only a specific subset of logs to Datadog, use the log_processing_rules parameter in your configuration file with the exclude_at_match or include_at_match type. Apply this to your schedular which generating the logs. Refer to this Datadog official doc for more information.

Hemanth Kumar
  • 2,728
  • 1
  • 4
  • 19