I am currently building a POC for my company on ConfluentCloud. The current version of KSQLDB does not yet support collect_list/collect_set on multiple columns, a struct or a map; therefore, I am trying to think of a workaround.
I am consuming SQL CDC streams, and trying to compose a nested object model with parent child relationships without the need to build a self hosted JAVA UDF or KSTREAM app.
Streams demo_games, demo_players and demo_teams should in the end yield the following model written to a kafka topic.
{
teamId: bigint,
teamName: string,
teamPlayers: [
{
playerid: bigint,
playername:string
}
],
teamGames: [
{
gameid: bigint,
gamename: string
}
]
}
Let's start with a smidge of code to illustrate what I am trying to achieve.
CREATE STREAM DEMO_GAMES( GAMEID BIGINT KEY, TEAMID BIGINT, GAMENAME STRING )
WITH (KAFKA_TOPIC='DEMO.GAMES',VALUE_FORMAT='JSON', PARTITIONS=1);
INSERT INTO DEMO_GAMES( GAMEID, TEAMID, GAMENAME) VALUES (1,1,'SUNDAY');
INSERT INTO DEMO_GAMES( GAMEID, TEAMID, GAMENAME) VALUES (2,1,'MONDAY');
INSERT INTO DEMO_GAMES( GAMEID, TEAMID, GAMENAME) VALUES (3,1,'FRIDAY');
CREATE STREAM DEMO_PLAYERS( PLAYERID BIGINT KEY, TEAMID BIGINT, PLAYERNAME STRING )
WITH (KAFKA_TOPIC='DEMO.PLAYERS',VALUE_FORMAT='JSON', PARTITIONS=1);
INSERT INTO DEMO_PLAYERS( PLAYERID, TEAMID, PLAYERNAME) VALUES (1,1,'PLAYER 1');
INSERT INTO DEMO_PLAYERS( PLAYERID, TEAMID, PLAYERNAME) VALUES (2,1,'PLAYER 2');
INSERT INTO DEMO_PLAYERS( PLAYERID, TEAMID, PLAYERNAME) VALUES (3,1,'PLAYER 3');
INSERT INTO DEMO_PLAYERS( PLAYERID, TEAMID, PLAYERNAME) VALUES (4,1,'PLAYER 4');
CREATE STREAM DEMO_TEAMS( TEAMID BIGINT KEY,TEAMNAME STRING )
WITH (KAFKA_TOPIC='DEMO.TEAMS',VALUE_FORMAT='JSON', PARTITIONS=1);
INSERT INTO DEMO_TEAMS( TEAMID, TEAMNAME) VALUES (1,'THE TEAM');
#create a few persistent queries...
create stream demo_team_players as
select
teamid,
playerid,
struct(playerid:=playerid,
playername:=playername ) `model`
from DEMO_PLAYERS emit changes;
create stream demo_team_games as
select
teamid,
gameid,
struct(gameid:=gameid,
gamename:=gamename) `model`
from DEMO_games emit changes;
The two persistent queries above wrap the data I want to include in collect_list into a struct. So now I can execute the following query.
select teamid, transform(collect_list( cast(`model` as string)), t=>t) as teamplayers from DEMO_TEAM_GAMES group by teamid emit changes;
#yields
{
"TEAMID": 1,
"TEAMPLAYERS": [
"Struct{GAMEID=1,GAMENAME=SUNDAY}",
"Struct{GAMEID=2,GAMENAME=MONDAY}",
"Struct{GAMEID=3,GAMENAME=FRIDAY}"
]
}
My question is this. Is there a way to take a "serialized" STRUCT string and convert it back to a STRUCT within the transform lambda?
I also tried building a dynamic JSON string as a parameter to the collection_list function and then build a struct using EXTRACT_JSON_FIELD. This seems very brittle and to get the JSON string, I am forced to cast all values to string.