3

we are using kafka in production and I try to push the adoption and usage of KSQL in the same direction. But I already failed with one simple table-table join. I’ve tried with our production data first and ran in an issue. So I thought I missed something and moved back to the example from the confluent docs and ran in the same problem. I will explain my issue with the example data https://docs.confluent.io/current/ksql/docs/tutorials/basics-docker.html#table-table-join When I have created both tables and try to join the data it works, but as soon as I try to alter or add something I get new entries in my table. From every example I found at confluent or even at the youtube videos this is not suppose to happen.

Creating records

docker run --interactive --rm --network tutorials_default \
  confluentinc/cp-kafkacat \
  kafkacat -b kafka:39092 \
          -t warehouse_location \
          -K: \
          -P <<EOF
1:{"warehouse_id":1,"city":"Leeds","country":"UK"}
2:{"warehouse_id":2,"city":"Sheffield","country":"UK"}
3:{"warehouse_id":3,"city":"Berlin","country":"Germany"}
EOF
docker run --interactive --rm --network tutorials_default \
  confluentinc/cp-kafkacat \
  kafkacat -b kafka:39092 \
          -t warehouse_size \
          -K: \
          -P <<EOF
1:{"warehouse_id":1,"square_footage":16000}
2:{"warehouse_id":2,"square_footage":42000}
3:{"warehouse_id":3,"square_footage":94000}
EOF

Creating tables

CREATE TABLE WAREHOUSE_LOCATION (WAREHOUSE_ID INT, CITY VARCHAR, COUNTRY VARCHAR)
WITH (KAFKA_TOPIC='warehouse_location',
      VALUE_FORMAT='JSON',
      KEY='WAREHOUSE_ID');

CREATE TABLE WAREHOUSE_SIZE (WAREHOUSE_ID INT, SQUARE_FOOTAGE DOUBLE)
WITH (KAFKA_TOPIC='warehouse_size',
      VALUE_FORMAT='JSON',
      KEY='WAREHOUSE_ID');

Creating a joined table:

CREATE TABLE WH_U AS SELECT WL.WAREHOUSE_ID, WL.CITY, WL.COUNTRY, WS.SQUARE_FOOTAGE
FROM WAREHOUSE_LOCATION WL
  LEFT JOIN WAREHOUSE_SIZE WS
    ON WL.WAREHOUSE_ID=WS.WAREHOUSE_ID;

With this I get the expected results:

1 | Leeds | UK | 16000.0
2 | Sheffield | UK | 42000.0
3 | Berlin | Germany | 94000.0

But when I add or chnage records, this happens:

1566375174496 | 1 | 1 | Leeds | UK | 16000.0
1566375174496 | 2 | 2 | Sheffield | UK | 42000.0
1566375174496 | 3 | 3 | Berlin | Germany | 94000.0
1566375595372 | 4 | 4 | London | UK | null
1566375641291 | 4 | 4 | London | UK | 94000.0
1566375641291 | 1 | 1 | Leeds | UK | 1.0

I expected:

1566375174496 | 1 | 1 | Leeds | UK | 1.0
1566375174496 | 2 | 2 | Sheffield | UK | 42000.0
1566375174496 | 3 | 3 | Berlin | Germany | 94000.0
1566375641291 | 4 | 4 | London | UK | 94000.0

What am I missing?

SOLVED

The reason for this behaviour was a simple env in ksql server. KSQL_CACHE_MAX_BYTES_BUFFERING was set to 0

General Grievance
  • 4,555
  • 31
  • 31
  • 45
hammi
  • 161
  • 1
  • 8
  • how did you add records? `1566375595372 | 4 | 4 | London | UK | null` means does not join with the other ktable – Holm Aug 21 '19 at 14:38
  • I added the new records with kafkacat kafkacat -P -b 127.0.0.1:29092 -t warehouse_size -K: 4:{"warehouse_id":4,"square_footage":94000} kafkacat -P -b 127.0.0.1:29092 -t warehouse_location -K: 4:{"warehouse_id":4,"city":"London","country":"UK"} The thins is, he joined but with the showed me with the select twice the entries with ID 4. I expected the same results as you can find in the live demo https://www.youtube.com/watch?v=-eMXWeBfK7U – hammi Aug 21 '19 at 18:17
  • You should create an answer for your question, and then mark your own answer as 'Accepted'. – miguno Aug 27 '19 at 10:19

0 Answers0