Synchronizing data from MSSQL to Elasticsearch using Apache Kafka

Question

I'm currently running a text search in SQL Server, which is becoming a bottleneck and I'd like to move things to Elasticsearch for obvious reasons, however I know that I have to denormalize data for best performance and scalability.

Currently, my text search includes some aggregation and joining multiple tables to get the final output. Tables, that are joined, aren't that big (up to 20GB per table) but are changed (inserted, updated, deleted) irregularly (two of them once in a week, other one on demand x times a day).

My plan would be to use Apache Kafka together with Kafka Connect in order to read CDC from my SQL Server, join this data in Kafka and persist it in Elasticsearch, however I cannot find any material telling me how deletes would be handled when data is being persisted to Elasticsearch.

Is this even supported by the default driver? If not, what are the possibilities? Apache Spark, Logstash?

The Confluent Elasticsearch connector doesn't currently support deletes via tombstones. It's probably a nice feature to add, so feel free to log an issue. — Randall Hauch, Aug 10 '17 at 15:37
@RandallHauch that's disappointing. I'll log an issue as you have suggested. — Evaldas Buinauskas, Aug 10 '17 at 16:59
@RandallHauch Is it possible that a empty document can be created instead of deleting it? That would also work fine in my case. — Evaldas Buinauskas, Aug 11 '17 at 17:38
I guess it depends on what happens on this line when `payload` is null, which is the case for a tombstone event. Like I mentioned, it'd be relatively easy to modify this code to do something else, such as use `Delete.Builder(key.id)` instead. The only trick is whether this will be okay if the doc in Elasticsearch happens to not exist. — Randall Hauch, Aug 11 '17 at 21:55

score 0 · Accepted Answer · answered Jul 15 '19 at 10:32

I am not sure whether this is already possible in Kafka Connect now, but it seems that this can be resolved with Nifi.

Hopefully I understand the need, here is the documentation for deleting Elasticsearch records with one of the standard NiFi processors:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-elasticsearch-5-nar/1.5.0/org.apache.nifi.processors.elasticsearch.DeleteElasticsearch5/

Synchronizing data from MSSQL to Elasticsearch using Apache Kafka

1 Answers1