This is a very old question I stumbled upon, so I'll show two different approaches to how this can be handled.
Let's prepare index and some test data first:
PUT /bookindex
{
"mappings": {
"book": {
"properties": {
"title": {
"type": "string"
},
"chapters": {
"type": "nested",
"properties": {
"title": {
"type": "string"
},
"length": {
"type": "long"
}
}
}
}
}
}
}
PUT /bookindex/book/1
{
"title": "My first book ever",
"chapters": [
{
"title": "epilogue",
"length": 1230
},
{
"title": "intro",
"length": 200
}
]
}
PUT /bookindex/book/2
{
"title": "Book of life",
"chapters": [
{
"title": "epilogue",
"length": 17
},
{
"title": "toc",
"length": 42
}
]
}
Now that we have this data in Elasticsearch, we can retrieve just the relevant hits using an inner_hits
. This approach is very straightforward, but I prefer the approach outlined at the end.
# Inner hits query
POST /bookindex/book/_search
{
"_source": false,
"query": {
"nested": {
"path": "chapters",
"query": {
"match": {
"chapters.title": "epilogue"
}
},
"inner_hits": {}
}
}
}
The inner_hits
nested query returns documents, where each hit contains an inner_hits
object with all of the matching documents, including scoring information. You can see the response.
My preferred approach to this type of query is using a nested aggregation with filtered sub aggregation which contains top_hits
sub aggregation. The query looks like:
# Nested and filter aggregation
POST /bookindex/book/_search
{
"size": 0,
"aggs": {
"nested": {
"nested": {
"path": "chapters"
},
"aggs": {
"filter": {
"filter": {
"match": { "chapters.title": "epilogue" }
},
"aggs": {
"t": {
"top_hits": {
"size": 100
}
}
}
}
}
}
}
}
The top_hits
sub aggregation is the one doing the actual retrieving
of nested documents and supports from
and size
properties among
others. From the documentation:
If the top_hits
aggregator is wrapped in a nested
or reverse_nested
aggregator then nested hits are being returned. Nested hits are in a
sense hidden mini documents that are part of regular document where in
the mapping a nested field type has been configured. The top_hits
aggregator has the ability to un-hide these documents if it is wrapped
in a nested
or reverse_nested
aggregator. Read more about nested in
the nested type mapping.
The response from Elasticsearch is (IMO) prettier (and it seems to return it faster (though this is not a scientific observation)) and "easier" to parse.