0

Considering I have three customer accounts and the data of three accounts are stored in three aliases in the single index which is distributed across three shards.

For a given query, how is the tf-idf calculated? Since it is in the single index does the term count is considered with all three account's data/aliases. I would like to know if we can restrict the term frequency count and IDF only to the account/one alias

Raghavi R
  • 65
  • 5

1 Answers1

0

If you are using routing with your aliases it will use specific routing to find the shards and will calculate the tf-idf on that specific shards otherwise your tf-idf is calculated based on index. For more information you can check custom routing in ElasticSearch here and here.

Update:

Index and shard definitions:

Data in Elasticsearch is organized into indices. Each index is made up of one or more shards. Each shard is an instance of a Lucene index, which you can think of as a self-contained search engine that indexes and handles queries for a subset of the data in an Elasticsearch cluster.

Kaveh
  • 1,158
  • 6
  • 16
  • Hey thnx, so as per my understanding the tf-idf is calculated per shard. Is there a way to route a search only to a particular index such that tf-idf is calculated based on the documents presented in that index ignoring other indices for tf-idf calculation in the same shard? – Raghavi R Jun 22 '21 at 14:42
  • Each shard only points to one index but index can have many shards, so when your search routed to specific shard your tf-idf calculated in this shard only. Also you can use filter in your query if you want to filter documents. – Kaveh Jun 23 '21 at 07:58
  • I updated my answer for describing better shards and indices. – Kaveh Jun 23 '21 at 09:29