0

I have a log that looks like this:

1613635264 host1 AAA 0.11 0.09 0.12 16 13
1613635264 host1 BBB 0.21 0.23 176141
1613635264 host2 AAA 2.08 1.76 1.38 4 3
1613635264 host2 BBB 6.21 0.12 228981
1613635264 host3 AAA 0.58 1.12 1.75 16 0
1613635264 host3 BBB 4.46 0.11 254346
1613635265 host4 AAA 1.07 1.11 1.38 16 4
1613635265 host5 AAA 18.21 17.97 19.19 5 2
1613635265 host4 BBB 3.18 0.40 105858
1613635265 host5 BBB 64.69 1.08 418177

AAA and BBB lines don't come in order but the timestamp (first column) is the same for the pair.

Is it possible with logstash to combine these 2 lines?

Something like this:

{ 
 time: 1613635264, 
 host: host1, 
 metric1: 0.11 
 metric2: 0.09 
 metric3: 0.12 
 metric4: 16 
 metric5: 13
 metric6: 0.21 
 metric7: 0.23 
 metric8: 176141
}

I was thinking to upsert the same document in elasticsearch. Is that possible?

Daniel
  • 341
  • 6
  • 24
  • This thread might help: https://stackoverflow.com/questions/35203391/logstash-merge-two-logs-into-one-output-document/35204605#35204605 (hint: use `aggregate` filter) – Val Feb 18 '21 at 17:59
  • It a good idea and would solve the issue but it has the constraint of working with just 1 worker. That might have an impact on performance. – Daniel Feb 19 '21 at 15:51
  • Performance is only a problem once you've witnessed it is a problem, not based on assumptions. It doesn't hurt trying – Val Feb 19 '21 at 15:52
  • The amount of data is high and comes in bursts. It will also be distributed in the near future. – Daniel Feb 19 '21 at 15:59

1 Answers1

0

I found the solution, I setup a unique document_id that matches the 2 pairs of rows, set doc_as_upsert=true and action=update.

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    doc_as_upsert => "true"
    action => "update"
    document_id => "%{@timestamp}%{hostname}"
  }
}
Daniel
  • 341
  • 6
  • 24