I don't think the way @Matthias described it is accurate/detailed enough. It is correct, but the root cause of such limitation(exists for ksqlDB
CREATE TABLE
syntax as well) is beyond just sheer fact that the keys must be unique for KTable.
The uniqueness in itself doesn't limit KTables
. After all, any underlying topic can, and often does, contain messages with the same keys.
KTable
has no problem with that. It will just enforce the latest state for each key. There are multiple consequences of this, including the fact that KTable
built from aggregated function can produce several messages into its output topic based on a single input message...But let's get back to your question.
So, the KTable needs to know which message for a specific key is the last message, meaning it's the latest state for the key.
What ordering guarantees does Kafka have? Correct, on per partition basis.
What happens when messages are re-keyed? Correct, they will be spread across partitions very different from the input message.
So, the initial messages with the same key were correctly stored by the broker itself into the same partition(if you didn't do anything fancy/stupid with your custom Partitioner
)
That way KTable
can always infer the latest state.
But what happens if the messages are re-keyed inside Kafka Streams application in-flight?
They will spread across partitions again, but with a different key now, and if your application is scaled out and you have several tasks working in parallel you simply can't guarantee that the last message by a new key is actually the last message as it was stored in the original topic. Separate tasks don't have any coordination like that. And they can't. It won't be efficient otherwise.
As a result, KTable
will lose its main semantic if such re-keying were allowed.