3

Is it possible to join more than two streams/tables in KSQL?

Example:

I have three streams:

CREATE STREAM StreamA (id BIGINT, message VARCHAR) WITH 
(KAFKA_TOPIC='TopicA', VALUE_FORMAT='DELIMITED');
CREATE STREAM StreamB (id BIGINT, aid BIGINT, message VARCHAR) WITH . 
(KAFKA_TOPIC='TopicB', VALUE_FORMAT='DELIMITED');
CREATE STREAM StreamC (id BIGINT, bid BIGINT, message VARCHAR) WITH 
(KAFKA_TOPIC='TopicC', VALUE_FORMAT='DELIMITED');

I try to create another stream by joining those three streams:

CREATE STREAM ABCStream AS SELECT * FROM StreamA a JOIN 
StreamB b ON b.aid = a.id JOIN StreamC c WITHIN 1 HOURS ON 
c.bid = b.id; 

I get the following exception:

mismatched input 'JOIN' expecting ';'  
Caused by: org.antlr.v4.runtime.InputMismatchException
Robin Moffatt
  • 30,382
  • 3
  • 65
  • 92
Tim Coombe
  • 33
  • 1
  • 4

1 Answers1

5

No, you can only join two per query in KSQL up to v5.0. You'd need to daisy-chain your queries, something like this:

Intermediate Stream:

CREATE STREAM ABStream AS \
   SELECT * \
     FROM StreamA a \
     JOIN StreamB b \
          ON b.aid = a.id;

Multi- join stream

CREATE STREAM ABCStream AS \
   SELECT * \
     FROM ABStream AB \
     JOIN StreamC c \
          WITHIN 1 HOURS \
          ON c.bid = AB.b_id;
Tim Coombe
  • 33
  • 1
  • 4
Robin Moffatt
  • 30,382
  • 3
  • 65
  • 92
  • I have been trying something similar based on an answer at [This Post](https://stackoverflow.com/questions/51459018/is-it-possible-to-use-multiple-left-join-in-confluent-ksql-query-tried-to-join), but run into a problem with incompatible partitions: Can't join ABSTREAM with STREAMC since the number of partitions don't match. ABSTREAM partitions = 4; STREAMC partitions = 1. Please repartition either one so that the number of partitions match. – Tim Coombe Nov 01 '18 at 08:38
  • It seems a bit clunky, but I got it to work by creating a another stream from StreamC which had the same number of partitions as the intermediate stream. CREATE STREAM StreamCPartitioned WITH (PARTITIONS=4) AS SELECT * FROM StreamC PARTITION BY id; Then joined that stream with the intermediate stream: CREATE STREAM ABCStream AS \ SELECT * \ FROM ABStream AB \ JOIN StreamCPartitioned c \ WITHIN 1 HOURS \ ON c.bid = AB.b_id; – Tim Coombe Nov 01 '18 at 10:35
  • Yeah, I got bit by this too. My source topics (A and B in this example) had 1 partition each and KSQL defaults to 4 partitions for the topics it creates (C in this example) which makes them incompatible for joining other source partitions (e.g. D). You have to specify the number of partitions in the WITH section in KSQL when defining streams to make sure they all have the same number of partitions. – Raymond Saltrelli Aug 09 '19 at 18:22