Elasticsearch database sync

Question

I'm using jdbc river to sync Elasticsearch and database.The known problem is that rows deleted from database remain in ES, jdbc river plugin doesn't solve that. Author of jdbc river suggested the way of solving the problem:

A good method would be windowed indexing. Each timeframe (maybe once per day or >per week) a new index is created for the river, and added to an alias. Old >indices are to be dropped after a while. This maintenance is similar to >logstash indexing, but it is outside the scope of a river.

My question is, what does that mean in precise way?

Lets say I have table in database called table1 with million rows, my try is as follows:

Create river called river1, with index1. index1 contains indexed rows of table1. Index1 is added to alias.
Some rows from table1 are deleted during the day so every night I create another river called river2, with index2 which contains only what is now present in table1.
Remove old index1 from alias and add index2 to alias.
Delete old index1.

Is that the right way?

score 2 · Answer 1 · answered Mar 03 '15 at 11:18

How about using the _ttl field? Define a static _ttl in the SQL-statement to be longer than the SQL-update frequency.

The SQL would be something like this when the river is scheduled to run more frequently than 1 hour:

"select '1h' as _ttl, some_id as _id, ..."

This way the _ttl gets updated when the river runs, but deleted rows will not get updated and will be removed from the ES when the _ttl expires.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-ttl-field.html

score 0 · Answer 2 · answered Mar 09 '15 at 12:35

Yes, it can be done using _ttl field, but I solved it using scripts.

Every night script starts with indexing table and creating an index for that day. Indexing can last for few hours.

Another scripts periodically reads output from localhost:9200/_river/jdbc/*/_state?pretty and checks if all rivers are finished (by checking existance of lastEndDate field). When all rivers are finished, alias is refreshed with newly created index. Old index is dropped.

Elasticsearch database sync

2 Answers2