0

So I've been trying to aggregate some stream data to a KTable using Kafka stream. My JSON from the topic looks like

{
   "id": "d04a6184-e805-4ceb-9aaf-b2ab0139ee84",
   "person": {
      "id": "d04a6184-e805-4ceb-9aaf-b2ab0139ee84",
      "createdBy": "user",
      "createdDate": "2023-01-01T00:28:58.161Z",
      "name": "person 1",
      "description": "test1"
   }
}....
KStream<Object, String> firstStream = builder.stream("topic-1").mapValues(value -> {
  JSONObject json = new JSONObject(String.valueOf(value));
  JSONObject json2 = new JSONObject(json.getJSONObject("person").toString());
  return json2.toString();
});

I get something like

null{"createdDate":"2023-01-01T00:28:58.161Z","createdBy":"user","name":"person 1","description":"test1","id":"d04a6184-e805-4ceb-9aaf-b2ab0139ee84"}
null{"createdDate":"2023-01-01T00:29:07.862Z","createdBy":"user","name":"person 2","description":"test 2","id":"48d8b895-eb27-4977-9dbc-adb8fbf649d8"}
null{"createdDate":"2023-01-01T00:29:12.261Z","createdBy":"anonymousUser","name":"person 2","description":"test 2 updated","id":"d8b895-eb27-4977-9dbc-adb8fbf649d8"}

I want to group this data in such a way such that person 1 will hold one JSON associated with it person 2 will hold a List of both JSON associated with it

I have checked this Kafka Streams API GroupBy behaviour which describes the same problem but the solution given there doesn't work for me. Do I have to perform any extra operations? Please help

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245

1 Answers1

1

In order to groupBy, you need a pairing key. So, use map to extract the name of each person.

Then, as the linked answer says, you need to aggregate after grouping to "combine data per person", across events.

By the way, you should setup the Streams config with JsonSerde for values rather than String Serde in order to reduce the need to manually parse each event.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • sounds insightful. is there some kind of reference, especially to config with JsonSerde. only have knowledge that serde are related to serialization and deserialization – Anant Majhi Jan 03 '23 at 15:22
  • Import JSON serializer+deserializer classes from `org.apache.kafka.connect.json` package ( from `org.apache.kafka:connect-json` dependency), then create `Serde jsonSerde = Serdes.serdeFrom(jsonSerializer, jsonDeserializer);` – OneCricketeer Jan 03 '23 at 15:29