0

After performing a reindex on a 75GB index, the new one went to 79GB.

Both indexes have the same doc count (54,123,676) and both have the exact same mapping. The original index has 6*2 shards and the new one has 3*2 shards.

The original index also has 75,857 deleted documents which were not moved across, so we are pretty stumped as to how it could even be smaller than the new one at all, let alone by a whole 4GB.

Original Index

{
    "_shards": {
        "total": 12,
        "successful": 12,
        "failed": 0
    },
    "_all": {
        "primaries": {
            "docs": {
                "count": 54123676,
                "deleted": 75857
            },
            "store": {
                "size_in_bytes": 75357819717,
                "throttle_time_in_millis": 0
            },
            ...
            "segments": {
                "count": 6,
                "memory_in_bytes": 173650124,
                "terms_memory_in_bytes": 152493380,
                "stored_fields_memory_in_bytes": 17914688,
                "term_vectors_memory_in_bytes": 0,
                "norms_memory_in_bytes": 79424,
                "points_memory_in_bytes": 2728328,
                "doc_values_memory_in_bytes": 434304,
                "index_writer_memory_in_bytes": 0,
                "version_map_memory_in_bytes": 0,
                "fixed_bit_set_memory_in_bytes": 0,
                "max_unsafe_auto_id_timestamp": -1,
                "file_sizes": {}
            }
            ...

New Index

{
    "_shards": {
        "total": 6,
        "successful": 6,
        "failed": 0
    },
    "_all": {
        "primaries": {
            "docs": {
                "count": 54123676,
                "deleted": 0
            },
            "store": {
                "size_in_bytes": 79484557149,
                "throttle_time_in_millis": 0
            },
            ...
            "segments": {
                "count": 3,
                "memory_in_bytes": 166728713,
                "terms_memory_in_bytes": 145815659,
                "stored_fields_memory_in_bytes": 17870464,
                "term_vectors_memory_in_bytes": 0,
                "norms_memory_in_bytes": 37696,
                "points_memory_in_bytes": 2683802,
                "doc_values_memory_in_bytes": 321092,
                "index_writer_memory_in_bytes": 0,
                "version_map_memory_in_bytes": 0,
                "fixed_bit_set_memory_in_bytes": 0,
                "max_unsafe_auto_id_timestamp": -1,
                "file_sizes": {}
            }
            ...

Any clues?

Ian
  • 241
  • 3
  • 16
  • Have you wait a little? Es merge data and could free some space in background. But i have no idea for a 75Go index how many time es need to complete it... – LeBigCat Feb 07 '19 at 16:59
  • Yeah it's been almost a week now, with no changes – Ian Feb 08 '19 at 14:22

1 Answers1

1

You should use segment merge feature. Since segments are immutable ES always creates new ones and slowly it merges itself. But this request will help you solve your problem.It merges all segments and save memory. But when you send this request, beware of that this request is little heavy. So choose off-peak hours to execute.

POST /_forcemerge?only_expunge_deletes=true

ozzimpact
  • 111
  • 8
  • I had run `POST /_forcemerge?max_num_segments=1`, but after processing it the result was still the same size. I don't think `only_expunge_deletes` will help much, as there are 0 deleted documents – Ian Feb 08 '19 at 14:26
  • It is not about deleted documents. After reindex operation there are lots of memory segments to be merged. You only make this process faster. After executing this request you should wait 5-10 mins. – ozzimpact Feb 08 '19 at 17:08
  • I re-ran the `_forecemerge`, this time with the `only_expunge_deletes=true`, but the size is still 79484557149 bytes, even after 30mins :( I've updated my question to add the `segments` nodes which might hopefully have some more info – Ian Feb 08 '19 at 18:18
  • Are both indexes in same cluster? – ozzimpact Feb 10 '19 at 19:57
  • Yup, both in the same cluster – Ian Feb 11 '19 at 09:46