I'm using jdbc river to sync Elasticsearch and database.The known problem is that rows deleted from database remain in ES, jdbc river plugin doesn't solve that. Author of jdbc river suggested the way of solving the problem:
A good method would be windowed indexing. Each timeframe (maybe once per day or >per week) a new index is created for the river, and added to an alias. Old >indices are to be dropped after a while. This maintenance is similar to >logstash indexing, but it is outside the scope of a river.
My question is, what does that mean in precise way?
Lets say I have table in database called table1 with million rows, my try is as follows:
- Create river called river1, with index1. index1 contains indexed rows of table1. Index1 is added to alias.
- Some rows from table1 are deleted during the day so every night I create another river called river2, with index2 which contains only what is now present in table1.
- Remove old index1 from alias and add index2 to alias.
- Delete old index1.
Is that the right way?