I read the documentation on the Kafka website but after trying to implement a complete minimal example ( producer --> kafka --> consumer) it's not very clear to me how the "consumer state", the offset needs to be handled.
Some info
- I'm using the HighLevel API (Java)
- My consumer is a simple class with a Main, basically the same that can be found on the "quickstart" Kafka page
- I'm using Zookeeper
- I'm using a single broker
Now, the documentation says that the HighLevel API consumer stores its state using zookeeper so I would expect the offset and therefore the state of the consumer would be maintained between
- Kafka broker restarts
- Consumer restarts
But unfortunately it doesn't: each time I restart the broker or the consumer, all messages are re-delivered. Now, probably these are stupid questions but
In case of Kafka restart: I understood that is up to the consumer to keep its state so probably when the broker (re)starts up redeliver all (!) messages and the consumer decides what to consume...is that right? If so, what happens if I have 10.0000.0000 of messages?
In case of JVM consumer restart: if the state is kept on Zookeeper why are the messages re-delivered? Is it possibile that the new JVM has a different consumer "identity"? And in this case, how can I bind the previous identity?