I need to create a Kafka topic from a combination a nine other topics, all of them produced by Debezium PostgreSQL source connector, in AVRO format. To start, I'm trying (so far unsuccessfully) to combine fields from only two topics.
So, first a create a ksqlDB table based on "REQUEST" topic:
ksql> CREATE TABLE TB_REQUEST (ID STRUCT<REQUEST_ID BIGINT> PRIMARY KEY)
WITH (KAFKA_TOPIC='REQUEST', FORMAT='AVRO');
And everything seems fine to me:
ksql> DESCRIBE TB_REQUEST;
Name : TB_REQUEST
Field | Type
-----------------------------------------------------------------------------------------------------------------------
ID | STRUCT<REQUEST_ID BIGINT> (primary key)
BEFORE | STRUCT<REQUEST_ID BIGINT, REQUESTER_ID INTEGER, STATUS_ID>
AFTER | STRUCT<REQUEST_ID BIGINT, REQUESTER_ID INTEGER, STATUS_ID>
SOURCE | STRUCT<VERSION VARCHAR(STRING), CONNECTOR VARCHAR(STRING), NAME VARCHAR(STRING), TS_MS BIGINT, SNAPSHOT VARCHAR(STRING), DB VARCHAR(STRING), SEQUENCE VARCHAR(STRING), SCHEMA VARCHAR(STRING), TABLE VARCHAR(STRING), TXID BIGINT, LSN BIGINT, XMIN BIGINT>
OP | VARCHAR(STRING)
TS_MS | BIGINT
TRANSACTION | STRUCT<ID VARCHAR(STRING), TOTAL_ORDER BIGINT, DATA_COLLECTION_ORDER BIGINT>
-----------------------------------------------------------------------------------------------------------------------
For runtime statistics and query details run: DESCRIBE <Stream,Table> EXTENDED;
Then I create another table from "EMPLOYEE" topic:
ksql> CREATE TABLE TB_EMPLOYEE (ID STRUCT<EMPLOYEE_ID INT> PRIMARY KEY)
WITH (KAFKA_TOPIC='EMPLOYEE', FORMAT='AVRO');
Again, everything seems ok.
ksql> DESCRIBE TB_EMPLOYEE;
Name : TB_EMPLOYEE
Field | Type
-----------------------------------------------------------------------------------------------------------------------
ID | STRUCT<EMPLOYEE_ID INTEGER> (primary key)
BEFORE | STRUCT<EMPLOYEE_ID INTEGER, NAME VARCHAR(STRING), HIRING_DATE DATE>
AFTER | STRUCT<EMPLOYEE_ID INTEGER, NAME VARCHAR(STRING), HIRING_DATE DATE>
SOURCE | STRUCT<VERSION VARCHAR(STRING), CONNECTOR VARCHAR(STRING), NAME VARCHAR(STRING), TS_MS BIGINT, SNAPSHOT VARCHAR(STRING), DB VARCHAR(STRING), SEQUENCE VARCHAR(STRING), SCHEMA VARCHAR(STRING), TABLE VARCHAR(STRING), TXID BIGINT, LSN BIGINT, XMIN BIGINT>
OP | VARCHAR(STRING)
TS_MS | BIGINT
TRANSACTION | STRUCT<ID VARCHAR(STRING), TOTAL_ORDER BIGINT, DATA_COLLECTION_ORDER BIGINT>
-----------------------------------------------------------------------------------------------------------------------
For runtime statistics and query details run: DESCRIBE <Stream,Table> EXTENDED;
But by trying to create my target table joining previous ones by Employee Id.
ksql> CREATE TABLE REQUEST_EMPLOYEE AS
SELECT RQ.ID->REQUEST_ID, RQ.AFTER->REQUESTER_ID, RQ.AFTER->STATUS_ID, EM.ID->EMPLOYEE_ID, EM.AFTER->NAME AS REQUESTER
FROM TB_REQUEST RQ
JOIN TB_EMPLOYEE EM ON RQ.AFTER->REQUESTER_ID = EM.ID->EMPLOYEE_ID;
I got the following error:
Could not determine output schema for query due to error: Invalid join condition: table-table joins require to join on the primary key of the right input table. Got RQ.AFTER->REQUESTER_ID = EM.ID->EMPLOYEE_ID.
Statement: CREATE TABLE REQUEST_EMPLOYEE WITH (KAFKA_TOPIC='REQUEST_EMPLOYEE', PARTITIONS=1, REPLICAS=1) AS SELECT
RQ.ID->REQUEST_ID REQUEST_ID,
RQ.AFTER->REQUESTER_ID REQUESTER_ID,
RQ.AFTER->STATUS_ID STATUS_ID,
EM.ID->EMPLOYEE_ID EMPLOYEE_ID,
EM.AFTER->NAME REQUESTER
FROM TB_REQUEST RQ
INNER JOIN TB_EMPLOYEE EM ON ((RQ.AFTER->REQUESTER_ID = EM.ID->EMPLOYEE_ID))
EMIT CHANGES;
Looking at output from "DESCRIBE TB_EMPLOYEE" command it looks like to me that "EM.ID->EMPLOYEE_ID" is the right choice. What am I missing?
Thanks in advance.
PS: ksqlDB version in 0.21.0