Say I have a stream of employees, keyed by empId
, which also includes departmentId
.
I want to aggregate by department. So I do a selectKey(
mapper to get departmentId)
, then groupByKey()
(or I could just do a a groupBy(...)
, I assume), and then, say, count(). What exactly happens? I gather that it does a "repartition". I think what happens is that it writes to an "internal" topic, which I is just a regular topic with a derived name, created automatically. That is, shared by all instances of the stream, not just one (i.e. not local). So the aggregation is across all of the new key, not just those messages from the source stream instance (I think). Is that correct?
I've not found a comprehensive description of repartitioning. Can anybody point me to a good article on this?