In doing my joins, I am finding that the 2nd block tends to give the expected result, whereas the 1st block does not and never hits the (aValue, bValue) -> myFunc(aValue, bValue)
. I didn't think the actual key mattered as long as I set the right field to join on (aKey, aValue) -> aValue.get("someField").asText()
, but there is something about using .selectKey((aKey, aValue) -> aValue.get("someField").asText())
beforehand that makes the join go through correctly. I have also seen some cases that did not require the selectKey
. Can someone explain the difference?
// does not join correctly and gives unexpected result
KStream<String, JsonNode> c = a
.leftJoin(b,
(aKey, aValue) -> aValue.get("someField").asText(),
(aValue, bValue) -> myFunc(aValue, bValue)
);
// does join correctly and gives expected result
KStream<String, JsonNode> c = a
.selectKey((aKey, aValue) -> aValue.get("someField").asText())
.leftJoin(b,
(aKey, aValue) -> aKey,
(aValue, bValue) -> myFunc(aValue, bValue)
);