I have a two level data structure that I'm trying to search and produce statistics on. Let's imagine the data is posts and comments. It would look something like this:
[
{"title": "Post 1", "comments": [
{"name": "Comment 1", "date": "2019-01-10", "character_count": 1000},
{"name": "Comment 2", "date": "2019-01-11", "character_count": 2000},
{"name": "Comment 3", "date": "2019-01-12", "character_count": 1500},
{"name": "Comment 4", "date": "2019-01-13", "character_count": 2500},
{"name": "Comment 5", "date": "2019-01-15", "character_count": 3000}
]},
{"title": "Post 2", "comments": [
{"name": "Comment 1", "date": "2019-01-10", "character_count": 400},
{"name": "Comment 2", "date": "2019-01-13", "character_count": 500},
{"name": "Comment 3", "date": "2019-01-15", "character_count": 4000},
]}
]
Several notes on the data:
- Every "comment" has way more attributes than just
character_count
(probably 30). - There aren't very many "comments" per "post", rarely over 200, and usually under 30.
- There are thousands of "posts".
- In the mapping, "comments" are a nested object.
When making the query, I'm only interested in one "comment" per "post", based on date. For a certain date, I need the latest comment that comes before that date. I'm able to get this data using a scripted field:
{
"script_fields": {
"relevant_comment": {
"script": "sorted = _source.comments.findAll { it.date < report_date } . sort { a, b -> b.date <=> a.date }; return sorted ? sorted.first() : null;",
"params": {
"report_date": "2019-01-12"
}
}
}
}
So for "2019-01-12" I'd get {"name": "Comment 2", "date": "2019-01-11", "character_count": 2000}
for "Post 1" and {"name": "Comment 1", "date": "2019-01-10", "character_count": 400}
for "Post 2".
Now I need to aggregate, but how do I do that? For example, I need to get average character count. Or maybe number of "posts" that have character count under a certain value?
There's an answer here, where it is suggested to put the script into the aggregation itself. While that works, it feels like an awful waste to sort and filter the list every time I need to get a single attribute.
Or maybe I don't even need the scripted fields and there's a different solution?