1

I just did a full reindex from a dump of a previous index but the newly created index is double the size of a previous one even before it indexed all the documents. What could be the reason?

The previous index was 3.7gb and the new is 7gb.

Update: It has now come down to 5.2gb (probably due to segments merge) but as you can see it is still larger than the previous index which is 3.7gb

enter image description here

Here's the shards output for both the indices: enter image description here

sheharbano
  • 211
  • 2
  • 13
  • The index size can show different values until the process is completed. But it must be a reason for double up. Can you check the followings: replica count, index mapping, segment count. – Musab Dogan Jul 28 '23 at 06:58
  • Old index has: Shards - 3, Num committed segments - 31, Num search segments - 31 New index has: Shards - 3, Num committed segments - 29, Num search segments - 29 I haven't changed anything in the mapping. – sheharbano Jul 28 '23 at 09:00
  • can also you check the deleted.docs count? – Musab Dogan Jul 28 '23 at 15:31
  • Old index has deleted count of 6605341 and new index has deleted count of 19848 – sheharbano Jul 28 '23 at 16:09
  • It looks like everything is almost the same... Lastly, can you check the `number of replicas`? And please add some screenshots and API call outputs to the thread. – Musab Dogan Jul 31 '23 at 09:41
  • Both have 1 primary and 2 replica shards. I have updated the question with a screenshot. – sheharbano Aug 01 '23 at 12:02
  • Thanks for sharing the screenshot. I believe it's because of the unassigned shards. Can you share the following API output? `GET _cat/shards/index_name_1,index_name_2?v` Yes definitely we found it. As you can see the **pri.store.sizes** and **store.size** have different sizes for the big index. Which means one of the replica of the big index allocated and 2 replica of small index not allocated. – Musab Dogan Aug 01 '23 at 12:52
  • I've updated the question with the screenshot. – sheharbano Aug 01 '23 at 12:55
  • You can check why the shards are unassigned with the following API call: `GET _cluster/allocation/explain` – Musab Dogan Aug 01 '23 at 12:56
  • Note: It's normal to see `objects_12` **pri.store.size** higher than `objects_27` because of the `docs.deleted` count. – Musab Dogan Aug 01 '23 at 12:58
  • Yes, I can see that the bigger index has one of the replicas unassigned but how does that affect the size of the primary index? – sheharbano Aug 01 '23 at 13:04
  • Also, could you please post your finding as an answer? – sheharbano Aug 01 '23 at 13:06
  • Yes I will share it as an answer. Unassigned shards are affecting the `store.size`. The store.size is the sum of all shards sizes. If a shards unassigned it won't be calculated. – Musab Dogan Aug 01 '23 at 13:16

1 Answers1

1

The reason for the differences between old and new index sizes is because of the unassigned shards.

GET _cat/shards/index_name_1,index_name_2?v

The above API call shows that there are some unassigned shards for a small index. Unassigned shards are affecting the store.size. The store.size is the sum of all shards sizes. If shards are unassigned it won't be calculated.

The pri.store.sizes and store.size have different sizes for the big index. This means one of the replicas of the big index is allocated and 2 replicas of the small index remain unassigned.

You can check why the shards are unassigned with the following API call.

GET _cluster/allocation/explain

Elasticsearch will retry 5 times to allocate the shards. If it's failed 5 times there won't be any automatic process to allocate those shards. You can force to allocate the shards with the following API call.

POST _cluster/reroute?retry_failed=true

Please note that, if you are struggling with disk watermark, e.g insufficient disk space, the allocation process will be failed again. You can have more disk space by removing the old indices or removing the old Elasticsearch logs etc.

Musab Dogan
  • 1,811
  • 1
  • 6
  • 8