I wan't to achieve the following group by aggregation using elasticsearch. The SQL equivalent would be like (sort of):
SELECT MIN(unix_time) as min_utime, MAX(unix_time) as max_utime
FROM myindex
GROUP BY http_user, src
Would the aggregations be something like this?
"aggs":{
"suser":{
"terms":{
"field":"http_user"
},
"aggs":{
"src":{
"terms":{
"field":"src"
},
"aggs":{
"max_utime":{
"max":{
"field":"unix_time"
}
},
"min_utime":{
"min":{
"field":"unix_time"
}
}
}
}
}
}
}
I am not sure I understood the aggregations thing correctly. Am I right?
created test to simulate the solution. I extract the aggregations in the form of list[dict(str, T)] for each bucket name and key or metric and value
for suser in aggregations.get("suser").get("buckets"):
for src in suser.get('src').get('buckets'):
row = dict(suser=suser.get('key'), src=src.get('key'),
min_utime=src.get('min_utime').get("value"),
max_utime=src.get("max_utime").get("value"))
results_array.append(row)
first_row = results_array[0]
http_user = first_row.get('suser')
src = first_row.get('src')
new_query = {
'query': {
'bool': {
'must': [
{"term":{'http_user': http_user}},
{"term":{'src': src}}
],
"filter":{
"range":{
"@timestamp":{
'gte': datetime.datetime(2016, 10, 3, 10, 22, 44, 569755),
'lte': datetime.datetime(2016, 10, 3, 14, 22, 44, 569755)
}
}
}
}
}
sort = ["unix_time:asc"]
results = []
for hit in scan(es_client, index="my_index", query=new_query, sort=sort)
results.append(hit)
first_hit = results[0].get('_source')
self.assertEqual(first_row.get("min_utime"),
first_hit.get("unix_time"))
The test randomly succeeds and fails. Also the result times are not equal (although the date times I provide are always the same as you can see from the query both the aggregation and the new query.)
Test results:
AssertionError: 1475489579.0 != 1475489620
AssertionError: 1475489961.0 != 1475489980