1

I'm using Elasticsearch.Net and NEST in an applicationand having trouble accessing documents in an Elasticsearh index when searching based on nested object Ids. The data structure is invoice -> lineItems -> rowItems. I want to search based on these rowItems Id. The (simplified) mapping of the index is:

"invoice": {
    "properties": {
      "lineItems": {
        "properties": {
          "accountId": {
            "index": "not_analyzed",
            "type": "string"
          },
          "listItems": {
            "properties": {
              "itemName": {
                "analyzer": "str_index_analyzer",
                "term_vector": "with_positions_offsets",
                "type": "string",
                "fields": {
                  "raw": {
                    "analyzer": "str_search_analyzer",
                    "type": "string"
                  }
                }
              },
              "listItemID": {
                "index": "not_analyzed",
                "type": "string"
              }
            }
          }
        }
      }
    }
}

And when I do a sense search in chrome of one of the nested objects I can successfully retrieve it:

POST /_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {
            "lineItems.rowItems.rowItemID" : "23f2157f-eb21-400d-b3a1-a61cf1451262"            
        }} 
      ]
    }
  }
}

Which returns the document type Invoice with all its details.

I've been playing around with the code to do this using NEST but have failed so far. I have a list of rowItemIds and I want to get all invoice documents that have an exact match on those Ids. This is what I currently have:

            var result = Execute(client => client.Search<Invoice>(s => s
    .Aggregations(a => a
        .Nested("my_nested_agg", n => n
            .Path("lineItems")
            .Aggregations(aa => aa
                .Filter("my_avg_agg", avg => avg
                    .Field(p => searchIds.Contains(p.LineItems.First().RowItems.First().TrackingItemID))
                )
            )
        )
    )
));

Where searchIds is the list of rowItemIds I'm searching for. The above code is totally wrong and I'm not familiar with the syntax on how to do this. Any help would be greatly appreciated.

CorribView
  • 711
  • 1
  • 19
  • 44
  • I noticed here: http://stackoverflow.com/questions/29346935/elasticsearch-nested-object-under-path-is-not-of-nested-type that the nested keyword is needed for doing nested searchs like: "type": "nested" Is it possible to search for these Ids without this (which I don't have). Also, an entire reindex is not an option at the moment due latency in the system. – CorribView Jan 30 '17 at 21:32

1 Answers1

2

Nested types are needed in scenarios where you wish to query across properties of a object property. Given your example,

  1. If you want to only query the listItemID of listItems on lineItems then having an object type for this will work fine.

  2. If you want to query the listItemID and itemName of listItems on lineItems, you would need to map listItems as a nested type.

The reason for this is that without using nested type, the association between the properties of a particular listItem are not stored when indexed. With a nested type, the association is stored (nested types are internally stored as documents).

The search query that you have is pretty similar in NEST; the match query doesn't need to be contained in a bool query should clause in this case

var client = new ElasticClient();

var searchResponse = client.Search<Invoice>(s => s
    .AllIndices()
    .AllTypes()
    .Query(q => q
        .Match(m => m
            .Field(f => f.LineItems.First().ListItems.First().ListItemID)
            .Query("23f2157f-eb21-400d-b3a1-a61cf1451262")
        )
    )
);

The lambda expression to get the field name is just that; an expression to get the field name.

This generates the query

POST http://localhost:9200/_search
{
  "query": {
    "match": {
      "lineItems.listItems.listItemID": {
        "query": "23f2157f-eb21-400d-b3a1-a61cf1451262"
      }
    }
  }
}

since listItemID is a not_analyzed string field, you can use term query here instead and, since you probably don't need a score calculated (a match in this case is either true or false), you can wrap this in a bool query filter clause which can take advantage of filter caching and should perform slightly better.

To get those documents that match a collection of ids, we can use the terms query

var ids = new[] {
    "23f2157f-eb21-400d-b3a1-a61cf1451262",
    "23f2157f-eb21-400d-b3a1-a61cf1451263",
    "23f2157f-eb21-400d-b3a1-a61cf1451264"
};

var searchResponse = client.Search<Invoice>(s => s
    .AllIndices()
    .AllTypes()
    .Query(q => q
        .Terms(m => m
            .Field(f => f.LineItems.First().ListItems.First().ListItemID)
            .Terms(ids)
        )
    )
);

And finally, a shorthand for wrapping this in a bool query filter clause, using the unary + operator

var searchResponse = client.Search<Invoice>(s => s
    .AllIndices()
    .AllTypes()
    .Query(q => +q
        .Terms(m => m
            .Field(f => f.LineItems.First().ListItems.First().ListItemID)
            .Terms(ids)
        )
    )
);
Russ Cam
  • 124,184
  • 33
  • 204
  • 266
  • Awesome, thanks Russ for clearing up my confusing regarding nested/objects. I implemented your code and can successfully retrieve the hard coded Id. One final part: if I have an array of Ids, how can I modify the above so it searches for all Ids in the array instead of just one. – CorribView Jan 30 '17 at 22:08
  • Awesome, it works. Appreciate the help on this, thorough and descriptive answer! – CorribView Jan 30 '17 at 22:23
  • 1
    @CorribView no worries :) I added a final piece about shorthand for `bool` query `filter` clause: https://www.elastic.co/guide/en/elasticsearch/client/net-api/2.x/bool-queries.html – Russ Cam Jan 30 '17 at 22:34
  • Quick follow up question: The ES query seems to be returning a default of 10 documents even if there are more in the index. How can I explicitly set it to return all matching documents? Thanks. – CorribView Feb 01 '17 at 15:48
  • 2
    You usually don't want to return **all** matching documents in one response; you can change the number of documents returned with `.Size(int)` (aliased with `.Take(int)`). You can fetch pages of documents of size x, skipping previous pages with `.From(x)` (aliased with `.Skip(x)`). If you want to do deep pagination, look at `search_after`, and if you want to return a _lot_ of documents efficiently, look at `scroll` API – Russ Cam Feb 01 '17 at 23:09
  • Thanks for the extra information. We're using pagination when retrieving the data from the SQL server and each page updates the the ES index before the next page queries the DB. I'll just have to be careful that when getting the documents from the ES index that it updates all of the root documents that have nested object Id in them (which can be in many different root objects). – CorribView Feb 02 '17 at 14:51