I am investigating some things for work where we have to prove the value of Kafka and KSQL as part of a POC. We are working on simple game events where a player can make a play and the game engine either assigns them points (considered a win) or they get nothing (considered a loss). One of our requirements is to prove the feasibility of KSQL on consecutive losses or wins. For the purposes of this question, I will focus on consecutive losses.
We have two tables, one for wins and one for losses that run against the same stream of game events.
The wins DDL is as follows:
CREATE TABLE TABLE__GAME_EVENTS__WINS AS
SELECT userId,
gameName,
COUNT(*) AS totalWins
FROM STREAM__GAME_EVENTS
WINDOW TUMBLING (SIZE 30 MINUTES)
WHERE pointsAllocated > 0
GROUP BY userId,
gameName;
The losses DDL is as follows:
CREATE TABLE TABLE__GAME_EVENTS__LOSSES AS
SELECT userId,
gameName,
COUNT(*) AS totalLosses
FROM STREAM__GAME_EVENTS
WINDOW TUMBLING (SIZE 30 MINUTES)
WHERE pointsAllocated = 0
GROUP BY userId,
gameName;
In order to achieve consecutive losses in the 30 minute window period, our current approach is to JOIN
the WINS
table onto the LOSSES
table and only fire when there is nothing in the WINS
table and something in the LOSSES
table.
To do that, we have the following code:
SELECT losses.userId,
losses.gameName,
losses.totalLosses
FROM TABLE__GAME_EVENTS__LOSSES losses
JOIN TABLE__GAME_EVENTS__WINS wins
ON wins.userId = losses.userId
AND wins.gameName = losses.gameName
WHERE losses.totalLosses >= 1
AND wins.totalWins = 0;
When I try to execute that code I get the following error:
io.confluent.ksql.parser.tree.LogicalBinaryExpression cannot be cast to io.confluent.ksql.parser.tree.ComparisonExpression
Initially I thought it was to do with the WHERE
clause but after looking at Hojjat's answer to a similar question, it appears to be that the issue has to do with the AND
statement in the JOIN
.
When I ran DESCRIBE EXTENDED
on both tables I get the following (same for both):
ksql> DESCRIBE EXTENDED TABLE__GAME_EVENTS__WINS;
Name : TABLE__GAME_EVENTS__WINS
Type : TABLE
Key field : KSQL_INTERNAL_COL_0|+|KSQL_INTERNAL_COL_1
Key format : STRING
Timestamp field : Not set - using <ROWTIME>
Value format : JSON
Kafka topic : TABLE__GAME_EVENTS__WINS (partitions: 4, replication: 1)
Field | Type
---------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
USERID | INTEGER
GAMENAME | VARCHAR(STRING)
TOTALWINS | BIGINT
---------------------------------------
Given that COL_0
- userId
and COL_1
- gameName
are the key fields on both tables, it seems logical that they have to be included in the JOIN
. Is it possible to get around this or is it currently not possible to JOIN
on multiple columns?