1

The documentation and recommendation for using stored_fields feature in ElasticSearch has been changing. In the latest version (7.9), stored_fields is not recommended - https://www.elastic.co/guide/en/elasticsearch/reference/7.9/search-fields.html Is there a reason for this?

Where as in version 7.4.0, there is no such negative comment - https://www.elastic.co/guide/en/elasticsearch/reference/7.4/mapping-store.html

What is the guidance in using this feature? Is using _source filtering a better option? I ask because in some other doc, _source filtering is supposed to kill performance - https://www.elastic.co/blog/found-optimizing-elasticsearch-searches

If you use _source or _fields you will quickly kill performance. They access the stored fields data structure, which is intended to be used when accessing the resulting hits, not when processing millions of documents.

What is the best way to filter fields and not kill performance with Elastic Search?

Amit
  • 30,756
  • 6
  • 57
  • 88
Arjun
  • 385
  • 5
  • 17
  • 1
    Just a quick note to mention that that blog article dates back to 2014, it's an eternity, ES 2.x was not even released, so you're comparing apples and oranges. – Val Oct 20 '20 at 03:55

1 Answers1

2

source filtering is the recommended way to fetch the fields and you are getting confused due to the blog, but you seem to miss the very important concept and use-case where it is applicable. Please read the below statement carefully.

_source is intended to be used when accessing the resulting hits, not when processing millions of documents.

By default, elasticsearch returns only 10 hits/search results which can be changed based on the size parameter and if in your search results, you want to fetch few fields value than using source_filter makes perfect sense as it's done on the final result set(not all the documents matching search results),

While if you use the script, and using source value try to read field-value and filter the search result, this will cause queries to scan all the index which is the second part of the above-mentioned statement(not when processing millions of documents.)

Apart from the above, as all the field values are already stored as part of _source field which is enabled by default, you need not allocate extra space if you explicitly mark few fields as stored(disabled by default to save the index size) to retrieve field-values.

Amit
  • 30,756
  • 6
  • 57
  • 88
  • Thank you for the response. If `_source` contains a field with a large json value, will excluding this kill performance of queries? Would Elastic Search have to deserialize _source, exclude the field and then re-serialize the response? – Arjun Oct 20 '20 at 04:00
  • 1
    when you want to get some fields-value through `_source,` it will have to do deserialize/serialize the data anyway, it depends on what level you want to optimize, you can also disable(_source) completely if you have large JSON and explictly stored few fields which you need in your search results. – Amit Oct 20 '20 at 04:08