0

I am working on a task of reindexing my Elastic search indexes in case any change happens. There are 2 ways that I can find to implement this but they look same to me unless I am missing something.

I am getting data to my Elastic search service from Postgres of service B, which has a paginated endpoint.

Approach 1:

  1. Create alias which will point to our existing index.
  2. When reindex is triggered, create a new index and once the reindexing is complete, point the alias, which was pointing to old index, to the newly created index.
  3. Delete the old index.

Approach 2:

  1. Create a new Index.
  2. Use the reindex API to copy the data from old index to new index, which will apply the new changes to the old documents.

To me, both of these look same. Disadvantage of using approach 2 seems that it will create a new index name, hence we will have to change the index names while querying.

Also, considering my reindexing operation would not be a frequent task, I am reading the data from a paginated endpoint and then creating indexes again, Approach 1 seems to make more sense to me.

halfer
  • 19,824
  • 17
  • 99
  • 186
newLearner
  • 637
  • 1
  • 12
  • 20
  • Please read [Under what circumstances may I add “urgent” or other similar phrases to my question, in order to obtain faster answers?](//meta.stackoverflow.com/q/326569) - the summary is that this is not an ideal way to address volunteers, and is probably counterproductive to obtaining answers. Please refrain from adding this to your questions. – halfer Jul 27 '20 at 09:08

1 Answers1

0

In approach1, you are using alias. In approach 2, you are not using alias.

Both would be same if you add alias to approach2 as step3 and step4 - delete the old index.

Refer As you need to do little often.

Gibbs
  • 21,904
  • 13
  • 74
  • 138
  • Thanks for the answer. But then the question remains, what is the point of using ```reindex API``` ? May be I am missing something big here ? – newLearner Jul 27 '20 at 09:15
  • is it that ```reindex API``` should be used where we have more frequent reindexing ? Because in my case it would not a frequent task for sure. – newLearner Jul 27 '20 at 09:17
  • Without reindex API, how do you do that in option 1. – Gibbs Jul 27 '20 at 09:20
  • As I am getting my data from service B, I am using a spring batch process. I am getting a List in the writer from service B, and then create a new index and put the documents in that index by reading all the data again. This would be like creating the index very first time. I am sorry if this sounds stupid as this is my first time working with ES. – newLearner Jul 27 '20 at 09:25
  • OK. You are recreating data from your spring job. Correct? – Gibbs Jul 27 '20 at 09:27
  • Yes I am recreating data using spring job. – newLearner Jul 27 '20 at 09:30
  • And you are saying that data can have new fields also. Am I correct? – Gibbs Jul 27 '20 at 09:31
  • Yeah if in case a requirement comes to add new field to ```MyObject``` and I need to add that field in the index. In that case I will have to add that new field to my document as well. So Yeah. – newLearner Jul 27 '20 at 09:33
  • ok, If downtime doesn't matter, then you can go for option 1. Otherwise go with two aliases. Does your spring job creates mapping for the new indices? – Gibbs Jul 27 '20 at 10:36
  • Yeah. Also if I want to go with approach 2, how will I get the values for the new fields in the old documents ? – newLearner Jul 27 '20 at 10:39
  • You can have a custom [script processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/script-processor.html) to do that. – Gibbs Jul 27 '20 at 10:41
  • Yeah but that script processor would not get the updated data from service B. I think approach 1 is the way to go. – newLearner Jul 27 '20 at 10:46