Is it possible to achieve exactly once by handling Kafka topic at Spark Streaming application?
To achieve exactly once you need the following things:
- Exactly once on Kafka producer to Kafka broker. This is achieved by Kafka's 0.11 idempotent producer. But is Kafka 0.11 to Spark Streaming integration production ready? I found this JIRA ticket with lots of bugs.
- Exactly once on Kafka broker to Spark Streaming app. Could it be achieved? Because of Spark Streaming app failures, the application can read some data twice, right? As solution, can I persist computation results & last handled event uuid to Redis transactionaly?
- Exactly once on trasforming data by Spark Streaming app. This is out-of-the-box property of RDD.
- Exactly once on persisting results. Is solved at the 2nd statement by transactionaly persisting last event uuid to Redis.