ElasticSearch performance considerations while mapping string fields as both text and keyword?

Question

I have a question regarding the tradeoffs/performance considerations to keep in mind while mapping string fields as both text and keyword vs just one of those.

I have a use-case where mapping around 25-30 string fields as both text and keyword would be a nice to have but if there were some serious performance considerations, then I would drill down and map each of them only to the type they will be searched most as.

I have not been able to find much information online about this. Hence asking here.

ElasticSearch Version 7.10 Thanks!

score 4 · Accepted Answer · answered Nov 25 '20 at 12:03

The default mappings provided by ES which map a field as both text and keyword usually do that because it's convenient and that will allow the field to be used in different contexts without having to think too hard about it. It's also a good way of bootstrapping new projects and not worry too much about that aspect until later in the project.

However, if you're truly serious about your mappings and the performance of your cluster, you should always give as much thought as possible as to why you map a field in certain way.

There are a few basic rules (but your mileage may always vary) in the following (non-exhaustive) list:

IDs, codes, keys, etc, that you usually use in exact searches can be mapped as keyword only (and/or wildcard depending on your search use cases).
If you have longer pieces of text closer to natural language that you might want to run full-text searches on, it's usually a good idea to map them as text.
The corollary to the previous rule is that if you know that you'll never want to run full text searches on some field, don't map it as text as there is a non-negligible overhead related to indexing text fields during the analysis process.
...

As said, obviously the above list is non-exhaustive, but it gives you some pointers. The bottom line is that you need to think hard about your data and what you want to do with it. Once you know the use cases you need to support, you'll know how to map your fields. I would never accept to let a default text/keyword mapping if there's no reason to do it.

I understand it's the best practice but is the implication of this as @Kaveh mentioned ? Is having a long description mapped as keyword really have that serious consequences for the speed of indexing and queries ? I assume the other way around (having a short string as text) wouldn't be this severe — cah1r, Aug 31 '22 at 08:16
The keyword type is not meant for long description text, it would make no sense to index free-text as keyword, which is why one needs to really think about the nature of his data and what he wants to do with it. — Val, Aug 31 '22 at 08:35
I understand. The problem is that we already have multiple customers with some index mappings specified where they unfortunately map long strings as keyword as well. Some of them probably have longer strings in certain fields than others. I know it's a difficult question but when we tell them that they should put in the time to fix this they will ask us how much performance gain are we talking about. Whether they will see a slight or a major improvement in their query and indexing speed. We could run some load tests for that but if some data is already available it would be really beneficial — cah1r, Aug 31 '22 at 09:26
Those numbers don't exist, that clearly depends on each context (hardware, mappings, volume, etc) and the same test would yield different results on different clusters having different configurations. Usually people start testing when they start feeling the pain :-) — Val, Aug 31 '22 at 09:31

score 2 · Answer 2 · answered Nov 25 '20 at 11:59

The performance of your search and indexing depends on size of your string field, if you have a large string and map it as a keyword it will have a heavy impact in your indexing and your search performance. if you decide to map field as both text and keyword be sure to set ignore_above in keyword becasue Lucene’s term byte-length limit is 32766, means Elasticsearch will not index strings bigger than this size as a keyword.

Also the type of Analyzer that you are going to use for your string fields have impact.

ElasticSearch performance considerations while mapping string fields as both text and keyword?

2 Answers2