0

Setup

Fallowing is my ES setup.

  • Using Elastic Cloud
  • Have 3 shard with 3 replicas
  • Size is 5 GB(3.2 millions documents)

Problem Statement

While performing the wildcard search, its giving a different result each time. I believe that the search is going to different shards and giving the fastest result first(score is same) .

  1. If I make my index with single shard instead of 3 shards for 3.2 million records(5 GB), will it impact the performance?

or

  1. What is the other best way to query multiple shards with the same result all the time with faster response time (not the priority).

PS I've gone through the below article and I didn't get clear idea.

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-preference.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html

Thanks in advance.

Moulali Shaik
  • 131
  • 1
  • 3
  • I don't really understand why you have this problem. Having a mapping/setting, an example and a query could Help. The good size target for an index is around 30Gb, so you can shrink to 1 shard (with 1 or more replica), and do a force merge. Performance should be improved – Jaycreation Nov 17 '20 at 09:56
  • Thanks @Jaycreation. I've tried by making single shard and 3 replicas. Still no luck. Below is the example mapping and the query for your reference. "Field1" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } "query": { "prefix": { "Field1.keyword": { "value": "q8" } } } **Result** 1st hit ."Field1" : "q8 zevenbergschen hoek" in the top result 2nd hit ."Field1" : "q8 zuidweg b.v." in the top result. – Moulali Shaik Nov 17 '20 at 13:00
  • 1
    I think I understand. Your first values return the exact same scores. If you want to have a most consistent result, you should specify a sorting. Try to add this after your query: "sort" : [ "_score", "field1" ] https://www.elastic.co/guide/en/elasticsearch/reference/current/sort-search-results.html – Jaycreation Nov 17 '20 at 14:07
  • 1
    @Jaycreation Sorting is not a good option in my case. However, I tried with zero replicas and 3 shards, now it is giving the identical result all the time. It seems the parallel execution of query happening on the replicas and giving the fastest results first. Thanks for your quick response. – Moulali Shaik Nov 17 '20 at 14:30
  • You are totally right. But note that without replicas, you have no data security. – Jaycreation Nov 17 '20 at 14:50
  • Yes true... I'm thinking what other possible ways to do it. Backup into other index weekly would help instead of replicas? Need some suggestion on this. – Moulali Shaik Nov 18 '20 at 04:09
  • You can setup snapshots but it's not exactly the same thing. Snapshots are a Crash a recover thing. Replicas ensure no service break. The good practice is to have at least 1 replica (it should also be good for performance if you have several nodes) and to find the best sorting for your use case. (score first then alpha looks to be a logical solution) – Jaycreation Nov 18 '20 at 07:10

0 Answers0