I'm trying to grab distinct latest data from Elasticsearch server using Logstash, eliminate some fields and zero values, and then insert it into Redis. For the elastic search data, i have a name field, some description fields, and 2 location values -- latitude and longitude. There are other guys that manage the elastic server, so I cannot change any configuration with the Elastic.
I found this Stackoverflow question that I think fits my problem: How to get latest values for each group with an Elasticsearch query?
So I tried using the query in the selected answer in my logstash configuration. Below is my configuration file:
input {
elasticsearch {
hosts => "elasticdb"
size => 1000
index => "logstash-db"
query =>'{"aggs":{"group":{"terms":{"field":"name.raw"},"aggs":{"group_docs":{"top_hits":{"size":1,"sort":[{"@timestamp":{"order":"desc"}}]}}}}}}'
}
}
filter {
if [latitude] = 0 and [longitude] = 0 {
drop { }
}
mutate {
remove_field => ["country","message","type","@timestamp","@version"]
}
}
output {
redis {
data_type => "list"
key => "%{name}"
}
stdout {
codec => rubydebug
}
}
The elastic query (for better view):
{
"aggs": {
"group": {
"terms": {
"field": "name.raw"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [{
"@timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}
I tried running logstash with above configuration, but logstash keeps sending some rows with exact same data multiple times. Any help?