1

We have two database tables which streams updates to kafka topics through a cdc application. We will keep every row's last version on Ktables. Then join them and write to another kafka topic on any update. Our code looks like this:

pTable = builder.stream(pTopic, pKeyDbOperationEventConsumed)
                          .selectKey(((key, value) -> key.getId()))
                          .mapValues(pMapFunc)
                          .groupByKey(stringPGrouped)
                          .reduce((aggValue, newValue) -> newValue, pMaterializedAs);
sPTable = builder.stream(sPTopic, keyDbOperationEventConsumed)
                               .selectKey(((key, value) -> key.getPId() + "-" + key.getSId()))
                               .mapValues(spMapFunc)
                               .groupByKey(stringSPGrouped)
                               .reduce((aggValue, newValue) -> newValue, sPMaterializedAs);
sPTable.join(pTable, (sp) -> sp.getPId().toString(), joinerFunc)
                     .toStream()
                     .to(upstreamTopic, producedWithFunc);

It worked well with small data on local environment. But we couldn't make it work on production. Our setup is: 5 pods, both topics have 30 partitions. With default configs it started to process, but stuck after processing very small data. We saw Attempt to heartbeat failed since group is rebalancing log. Then we changed configs to:

- max.poll.interval.ms = 3600000
- request.timeout.ms = 7200000
- session.timeout.ms = 900000
- num.stream.threads = 6 

But no luck. It couldn't even process one record. And we encontered this broker log: Member x in group x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)

My first question is if our use case is valid or not. If valid how can we trace the underlying problem?

hasan
  • 11
  • 3

0 Answers0