1

I am having logs coming from various sources and the format of the logs is

[2018-11-20 11:27:41,187] {base_task.py:98} INFO - Subtask: [2018-11-20 11:27:41,186] {child_task.py:355} INFO - Inside poll job status

[2018-11-20 11:27:41,187] {base_task.py:98} INFO - Subtask: [2018-11-20 11:27:41,186] {child_task.py:357} DEBUG - Poll time out has been set to: 6 hr(s)

[2018-11-20 11:27:41,188] {base_task.py:98} INFO - Subtask: [2018-11-20 11:27:41,186] {child_task.py:369} DEBUG - Batch_id of the running job is = 123456

[2018-11-20 11:27:41,188] {base_task.py:98} INFO - Subtask: [2018-11-20 11:27:41,186] {child_task.py:377} DEBUG - Getting cluster ID for the cluster: 

I want to push these logs to the elastic search having an index as batch_id, how can I achieve this? The issue is that I am having batch_id in some of the lines, not in all the lines. I have written the custom parser to convert the logs into JSON

td-agent.conf is

<source>
  @type tail
  path /tmp/logs/airflow.logs
  pos_file /tmp/logs/airflow1.pos
  format /^\[(?<logtime>[^\]]*)\] \{(?<parent_script>[^ ]*)\} (?<parent_script_log_level>[^ ]*) - (?<subtask_name>[^ ]*): \[(?<subtask_log_time>[^\]]*)\] \{(?<script_name>[^ ]*)\} (?<script_log_info>[^ ]*) - (?<message>[^*]*)/
  time_key logtime
  tag airflow_123
  read_from_head true
  include_tag_key true
  tag_key event_tag
  @log_level debug
</source>

<match airflow_123>
  @type copy
  <store>
    @type stdout
  </store>
  <store>
  @type elasticsearch
  host es_host
  port es_port
  index_name fluentd.${tag}.%Y%m%d
  <buffer tag, time>
    timekey 1h # chunks per hours ("3600" also available)
  </buffer>
  type_name log
  with_transporter_log true
  @log_level debug
  </store>
</match>

Also, what would be the best practice for log aggregation using EFK stack?

Saeed Hassanvand
  • 931
  • 1
  • 14
  • 31
AmanJ
  • 11
  • 1

1 Answers1

0

If you want to stick to the components of the Elastic stack, the logs can be read, parsed and persisted as below:

  1. Filbeat: Reads the events (every logical line of the logs) and pushes it to the Logstash
  2. Logstash: Parse the logs to breakup the strings into meaningful fields as per your requirement. Strings can be parsed using GROK filters. This is preferred than building custom parsers. The parsed information is sent to Elasticsearch for getting persisted and indexed preferably based on timestamp.
  3. Kibana: Visualize the parsed information using single search or aggregation.
Srikanta
  • 1,145
  • 2
  • 12
  • 22