3

I have been playing around with logstash and have been able to insert some of my MySQL tables to Elastic search.

Now, I need to combine multiple rows from the a MySQL Query statememt into one document. For example, lets say I have a mySQL query and I have the following result,

1   Group1  Company1    User1
1   Group1  Company1    User2
1   Group2  Company2    User3
1   Group2  Company2    User4

I would like to create 2 Group documents in Elastic Search for Group1 and Group2. Inside Group1, I will need to add User1 and User2. Inside Group2, I will need to add User3 and User4.

Is this possible using LogStash and combining with some Filters? If so can anyone point me in the right direction please.

Karthik H
  • 123
  • 1
  • 11
  • These answers might help: http://stackoverflow.com/a/36982705/4604579 + http://stackoverflow.com/a/34691171/4604579 – Val Mar 23 '17 at 04:50

1 Answers1

7

You can use the logstash-filter-aggregate. The filter takes a task_id, in your case the group field, and aggregates the rows that have the same task_id value. So, events with group1 will be aggregated, then when the filter detects a new group value, the aggregated user data for group1 will be pushed as a single Logstash event to output to your elasticsearch document. Then, it starts the process over with group2.

Make sure the logstash filter workers are set to 1 (-w 1 flag or in the config file) and the rows are ordered in a way that all of group1 events come in, then all of group2 etc. Otherwise the events may be processed out of sequence and can have unexpected results/data loss.

filter {
  aggregate {
    task_id => "%{group}"
    code => "
      map['users'] ||= []
      map['users'].push(event.get('user'))
    "
    push_previous_map_as_event => true
    timeout_tags => ['aggregated']
  }

  if "aggregated" not in [tags] {
    drop {}
  }
}

The result will look something like this:

"_source": {
  "group": "group1",
  "users": ["user1", "user2"]
}
cattastrophe
  • 291
  • 1
  • 5