0

I'm indexing data of unknown schema in Elasticsearch using dynamic mapping, i.e. we don't know the shape, datatypes, etc. of much of the data ahead of time. In queries, I want to be able to aggregate on any field. Strings are (by default) mapped as both text and keyword types, and only the latter can be aggregated on. So for strings my terms aggregations must look like this:

"aggs": {
    "something": {
        "terms": {
            "field": "something.keyword"
        }
    }
}

But other types like numbers and bools do not have this .keyword sub-field, so aggregations for those must look like this (which would fail for text fields):

"aggs": {
    "something": {
        "terms": {
            "field": "something"
        }
    }
}

Is there any way to specify a terms aggregation that basically says "if something.keyword exists, use that, otherwise just use something", and without taking a significant performance hit?

Requiring datatype information to be provided at query time might be an option for me, but ideally I want to avoid it if possible.

Todd Menier
  • 37,557
  • 17
  • 150
  • 173

1 Answers1

1

If the primary use case is aggregations, it may be worth changing the dynamic mapping for string properties to index as a keyword datatype, with a multi-field sub-field indexed as a text datatype i.e. in dynamic_templates

{
  "strings": {
    "match_mapping_type": "string",
    "mapping": {
      "type": "keyword",
      "ignore_above": 256,
      "fields": {
        "text": {
          "type": "text"
        }
      }
    }
  }
},
Russ Cam
  • 124,184
  • 33
  • 204
  • 266
  • Ah, I see, you're flipping it so `text` is the sub-field instead of `keyword`. Great idea! I wouldn't say it's the _primary_ use case, I do still need full-text search (and need to deal with unknown schemas there too), but I'll give this a try and see if I can get it to work without ending up with a similar problem on the search side. – Todd Menier Aug 04 '18 at 22:21
  • 1
    Another approach might be to retrieve the mapping, cache it for a certain period of time in the application, and use that to guide which fields should be targeted – Russ Cam Aug 05 '18 at 04:15
  • The `dynamic_templates` solution worked great, thank you! Although I did find myself in an all too familiar place: eager to implement a reasonably simple, _well-documented_ Elasticsearch operation, then spending half a day pulling my hair out trying to figure out how to translate it to Nest. I'm seriously considering switching all my code over to use the low-level client unless you can talk me off the ledge. ;) – Todd Menier Aug 09 '18 at 16:43