I am wrestling with ingesting Apache Airflow logs into Elasticsearch, using Logstash filters to parse the log lines. One thing that I am struggling with getting my head around how to do appropriately is to handle cases where log lines are nested, e.g. if a workflow logging from within a task. For instance, one log line might look like this:
[2020-01-28 20:23:21,341] {{base_task_runner.py:115}} INFO - Job 389: Subtask delete_consumptiondata [2020-01-28 20:23:21,341] {{cli.py:545}} INFO - Running <TaskInstance: azureconsumption_usage-1.1.delete_consumptiondata 2020-01-27T00:00:00+00:00 [running]> on host devaf1-dk1.sys.dom
Is there anyone who have thoughts on what might be an appropriate way - or even better, experience handling nested log lines such as this?