10

I'd like to search an array of nested documents and return only those that fit a specific criteria.

An example mapping would be:

{"book":
    {"properties":
        {
         "title":{"type":"string"},
         "chapters":{
                     "type":"nested",
                     "properties":{"title":{"type":"string"},
                                   "length":{"type":"long"}}
                                  }
                     }
          }
     }
}

So, say I want to look for chapters titled "epilogue". Not all the books have such a chapter, but If I use a nested query I'd get, as a result, all the chapters in a book that has such a chapter. While all I'm interested is the chapters themselves that have such a title.

I'm mainly concerned about i/o and net traffic since there might be a lot of chapters.

Also, is there a way of retrieving ONLY the nested document, without the containing doc?

eran
  • 14,496
  • 34
  • 98
  • 144
  • Aren't chapters always nested under the books object? – concept47 May 28 '13 at 10:00
  • You can't with nested docs afaik. You could however remodel this to a parent (book)-child(chapter) relationship . In that case your problem + answer is similar to http://stackoverflow.com/questions/7431889/how-can-i-retrieve-matching-children-only – Geert-Jan May 28 '13 at 10:36
  • 1
    relevant issues on github to make it possible to return a matching nested-context: https://github.com/elasticsearch/elasticsearch/issues/1383 and the newer https://github.com/elasticsearch/elasticsearch/issues/3022 – Geert-Jan May 28 '13 at 10:40
  • @Geert-Jan parent-child is not good enough, since it does the join in-memory, and my DB is huge (several hundreds of GBs...). Thanks for the tip, though :) – eran May 28 '13 at 11:47
  • 1
    Parent child improved a lot with 0.90. Maybe you can try it out. Otherwise it's not possible to do what you want in a single query. – javanna May 28 '13 at 11:57
  • @javanna, how about several queries? the first can return, for example, just the ID of the parent (book), can I do it with several queries? – eran May 28 '13 at 12:33
  • Yes if you index them in different documents. You would be doing parent child manually. – javanna May 28 '13 at 12:36
  • Oh. I meant if it's still nested. Alright then, I guess that's a no. 'Thanks anyway! – eran May 28 '13 at 12:43
  • Problem is that even though nested documents are currently indexed as separate documents internally, you cannot get back from elasticsearch only those separate documents by now. – javanna May 28 '13 at 15:25
  • It looks like it will finally be possible to do this on elasticsearch 1.5: https://github.com/elastic/elasticsearch/issues/2662 – Rafael Almeida Mar 11 '15 at 21:46

1 Answers1

13

This is a very old question I stumbled upon, so I'll show two different approaches to how this can be handled.

Let's prepare index and some test data first:

PUT /bookindex
{
  "mappings": {
    "book": {
      "properties": {
        "title": {
          "type": "string"
        },
        "chapters": {
          "type": "nested",
          "properties": {
            "title": {
              "type": "string"
            },
            "length": {
              "type": "long"
            }
          }
        }
      }
    }
  }
}

PUT /bookindex/book/1
{
  "title": "My first book ever",
  "chapters": [
    {
      "title": "epilogue",
      "length": 1230
    },
    {
      "title": "intro",
      "length": 200
    }
  ]
}

PUT /bookindex/book/2
{
  "title": "Book of life",
  "chapters": [
    {
      "title": "epilogue",
      "length": 17
    },
    {
      "title": "toc",
      "length": 42
    }
  ]
}

Now that we have this data in Elasticsearch, we can retrieve just the relevant hits using an inner_hits. This approach is very straightforward, but I prefer the approach outlined at the end.

# Inner hits query
POST /bookindex/book/_search
{
  "_source": false,
  "query": {
    "nested": {
      "path": "chapters",
      "query": {
        "match": {
          "chapters.title": "epilogue"
        }
      },
      "inner_hits": {}
    }
  }
}

The inner_hits nested query returns documents, where each hit contains an inner_hits object with all of the matching documents, including scoring information. You can see the response.

My preferred approach to this type of query is using a nested aggregation with filtered sub aggregation which contains top_hits sub aggregation. The query looks like:

# Nested and filter aggregation
POST /bookindex/book/_search
{
  "size": 0,
  "aggs": {
    "nested": {
      "nested": {
        "path": "chapters"
      },
      "aggs": {
        "filter": {
          "filter": {
            "match": { "chapters.title": "epilogue" }
          },
          "aggs": {
            "t": {
              "top_hits": {
                "size": 100
              }
            }
          }
        }
      }
    }
  }
}

The top_hits sub aggregation is the one doing the actual retrieving of nested documents and supports from and size properties among others. From the documentation:

If the top_hits aggregator is wrapped in a nested or reverse_nested aggregator then nested hits are being returned. Nested hits are in a sense hidden mini documents that are part of regular document where in the mapping a nested field type has been configured. The top_hits aggregator has the ability to un-hide these documents if it is wrapped in a nested or reverse_nested aggregator. Read more about nested in the nested type mapping.

The response from Elasticsearch is (IMO) prettier (and it seems to return it faster (though this is not a scientific observation)) and "easier" to parse.

miha
  • 3,287
  • 3
  • 29
  • 44