1

I have index in production with 1 replica (this takes total ~ 1TB). Into this index every time coming new data (a lot of updates and creates). When i have created the copy of this index - by running _reindex(with the same data and 1 replica as well) - the new index takes 600 GB. Looks like there is a lot of junk and some kind of logs in original index which possible to cleanup. But not sure how to do it.

The questions: how to cleanup the index (without _reindex), why this is happening and how to prevent for it in the future?

Igor Benikov
  • 884
  • 6
  • 21

1 Answers1

2

Lucene segment files are immutable so when you delete or update (since it can't update doc in place) a document, old version is just marked deleted but not actually removed from disk. ES runs merge operation periodically to "defragment" the data but you can also trigger merge manually with _forcemerge (try running with only_expunge_deletes as well: it might be faster).

Also, make sure your shards are sized correctly and use ILM rollover to keep index size under control.

ilvar
  • 5,718
  • 1
  • 20
  • 17