I'm trying to use Apache Flink 1.6.0 to read some messages from a kafka topic, transform them and finally send them to another kafka topic. I use savepoints to save the state of the application in case of cancellation and restating. The problem is that I have duplication in reading the messages after restart. The kafka version is 011. Thanks for any helpful comment.
Asked
Active
Viewed 217 times
1 Answers
0
To avoid duplicates, it's necessary to pass Semantic.EXACTLY_ONCE
when setting up the kafka producer. See the documentation for more details concerning data loss and duplication when working with Kafka.

David Anderson
- 39,434
- 4
- 33
- 60
-
Thanks for your answer. When I pass the Semantic.EXACTLY_ONCE parameter to the producer, I get this error message: "Pending record count must be zero at this point: 1" – Nastaran Motavalli Sep 28 '18 at 08:39
-
The error was because of I set the "value.serializer" to "org.apache.kafka.common.serialization.StringSerializer" instead of "org.apache.kafka.common.serialization.ByteArraySerializer" – Nastaran Motavalli Sep 29 '18 at 09:18