Join one-to-many relation with spring cloud kafka stream

Question

I'm trying to join data from two topics person and address where one person can have multiple addresses. The data published into the topics look like the following:

//person with id as key
{"id": "123", "name": "Tom Tester"}

//addresses with id as key
{"id": "321", "person_id": "123", "address": "Somestreet 12, 4321 Somewhere"}
{"id": "432", "person_id": "123", "address": "Otherstreet 12, 5432 Nowhere"}

After the join I would like to have an aggregated output (to be indexed in elasticsearch) which should look something like this:

{
  "id": "123",
  "name": "Tom Tester",
  "addresses": [
    {
      "id": "321",
      "address": "Somestreet 12, 4321 Somewhere"
    },
    {
      "id": "432",
      "address": "Otherstreet 12, 5432 Nowhere"
    }
  ]
}

Whenever person or address topic gets an update the aggregated person should also be updated. Currently I achieved to get updates on the aggregated person only when addresses are published but not when the person itself is changed. Any ideas what is wrong with this code?

@SpringBootApplication
@EnableBinding(PersonAggregatorBinding.class)
public class KafkaStreamTestApplication {

    public static void main(String[] args) {
        SpringApplication.run(KafkaStreamTestApplication.class, args);
    }

    private static final Logger LOG = LoggerFactory.getLogger(KafkaStreamTestApplication.class);

    @StreamListener
    @SendTo("person-aggregation")
    public KStream<String, PersonAggregation> process(
            @Input("person-input") KTable<String, Person> personInput,
            @Input("address-input") KTable<String, Address> addressInput) {
        KTable<String, AddressAggregation> addressAggregate = addressInput.toStream()
                .peek((key, value) -> LOG.info("addr {}: {}", key, value))
                .groupBy((k, v) -> v.getPersonId(), Grouped.with(null, new AddressSerde()))
                .aggregate(
                        AddressAggregation::new,
                        (key, value, aggregation) -> {
                            aggregate(aggregation, value);
                            return aggregation;
                        }, Materialized.with(Serdes.String(), new AddressAggregationSerde()));

        addressAggregate.toStream()
                .peek((key, value) -> LOG.info("aggregated addr: {}", value));

        return personInput.toStream()
                .leftJoin(addressAggregate, this::join, Joined.with(Serdes.String(), new PersonSerde(), new AddressAggregationSerde()))
                .peek((key, value) -> LOG.info("aggregated person: {}", value));
    }

    private PersonAggregation join(Person person, AddressAggregation addrs) {
        return PersonAggregation.builder()
                .id(person.getId())
                .name(person.getName())
                .addresses(addrs)
                .build();
    }

    public void aggregate(AddressAggregation aggregation, Address address) {
        if(address != null){
            aggregation.removeIf(it -> Objects.equals(it.getId(), address.getId()));
            if(address.isValid()) {
                aggregation.add(address);
            }
        }
    }
}

Maybe record caching? Try to disable the `KTable` caches: https://docs.confluent.io/platform/current/streams/developer-guide/memory-mgmt.html — Matthias J. Sax, Feb 28 '21 at 19:14
Unfortunately this didn't help. Since this is some time ago since I asked this question I had to setup the example again and now I have a different behavior: I only get updates in the aggregation whenever a person is updated. — m-kay, Mar 02 '21 at 05:56
Well, you do a stream-table join, so what you observe is expected. Checkout: https://www.confluent.io/blog/crossing-streams-joins-apache-kafka/ -- I guess you want to use a table-table join instead -> `personInput.leftJoin(addressAggregate)` -- why do you convert the `personInput` to a KStream before the join? — Matthias J. Sax, Mar 02 '21 at 17:10
Also: https://www.confluent.io/kafka-summit-ny19/zen-and-the-art-of-streaming-joins/ and https://www.confluent.io/resources/kafka-summit-2020/the-flux-capacitor-of-kafka-streams-and-ksqldb/ — Matthias J. Sax, Mar 02 '21 at 17:12
Yes you are right, table-table join works as expected. However I could not use Springs KTable auto binding directly because Serdes are configured wrong. I only managed to make it work when using the KStream binding and create the KTable in my code with `personInput.toTable(Materialized.with(Serdes.String(), new PersonSerde()))` — m-kay, Mar 04 '21 at 05:13
My follow-up question then is how I could use the addresses as GlobalKTable. Since the persons and addresses are not co-partitioned and I need all the addresses joined with the person I would need a GlobalKTable which I'm not able to join with a KTable but only with KStream. — m-kay, Mar 04 '21 at 05:16
Not sure about Spring... (don't know how it really works). -- GlobalKTable-KTable joins are a missing feature in Kafka Streams (cf https://issues.apache.org/jira/browse/KAFKA-4628). However, since Kafka 2.4.0 FK-key KTable-KTable joins are supported anyway that should address this issue: https://issues.apache.org/jira/browse/KAFKA-3705 — Matthias J. Sax, Mar 08 '21 at 23:09

Join one-to-many relation with spring cloud kafka stream

0 Answers0