2

Is it possible to achieve exactly once by handling Kafka topic at Spark Streaming application?

To achieve exactly once you need the following things:

  1. Exactly once on Kafka producer to Kafka broker. This is achieved by Kafka's 0.11 idempotent producer. But is Kafka 0.11 to Spark Streaming integration production ready? I found this JIRA ticket with lots of bugs.
  2. Exactly once on Kafka broker to Spark Streaming app. Could it be achieved? Because of Spark Streaming app failures, the application can read some data twice, right? As solution, can I persist computation results & last handled event uuid to Redis transactionaly?
  3. Exactly once on trasforming data by Spark Streaming app. This is out-of-the-box property of RDD.
  4. Exactly once on persisting results. Is solved at the 2nd statement by transactionaly persisting last event uuid to Redis.
VB_
  • 45,112
  • 42
  • 145
  • 293

0 Answers0