4

I'm coming from a long-term SQL background -- NoSQL (and ElasticSearch) is very new to me.

An engineer on my team is constructing a new index for document storage, and they have mapped all short/int/long values to strings for use in term queries.

This surprised me, as a SQL index with an SmallInt/Int/BigInt key will perform much better than that same set of values turned into a VarChar(X) and indexed accordingly.

I was pointed to this article: https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html

Which has this comment:

Consider mapping a numeric identifier as a keyword if:

  • You don’t plan to search for the identifier data using range queries.
  • Fast retrieval is important. term query searches on keyword fields are often faster than term searches on numeric fields.

I'm happy take this at face value, but I don't understand why this is.

Assuming an exact match type query (e.g. ID = 100), can anyone speak to the mechanics of ElasticSearch (or NoSQL in general), that would explain why a query against a stringified numeric value is faster than a query against numeric values directly?

Jake McGraw
  • 135
  • 4
  • 11

1 Answers1

2

Basically, keywords are stored in the inverted index and the lookup is really fast, which makes keyword the ideal type for term/s queries (i.e. exact match)

Numeric values, however, are stored in BKD trees (since ES 5/Lucene 6) which are more optimal than the inverted index for numeric values and also optimized for range-like queries.

The downside is that searching for an exact numerical value within a BKD tree is less performant than looking up the term in the inverted index.

So the take away from this is that if your IDs are numeric and you plan on querying them in ranges, map them with a numeric type like integer, etc. But, if you plan on matching your ID in a term/exact-like fashion, then store them as string with a keyword type.

Val
  • 207,596
  • 13
  • 358
  • 360