0

I was testing the exactly once semantics on ksqldb server by very un-graceful shutdown of docker running process or letting the docker container to run out of memory. In both cases I receive duplicates which definitely is not the guaranteed behaviour. I feel like I might be missing the obvious here ...

The docker container has KSQL_KSQL_STREAMS_PROCESSING_GUARANTEE=exactly_once parameter set. As far as I understand this will set the underlying producer setting for enable.idempotence and consumers isolation.level property.

And still the duplicates appear as a result of following queries: here

create or replace table TEST with (kafka_topic =  'TEST', value_format='avro',partitions=10, replicas=1) 
AS
SELECT 
    CUSTOMERS_ID,
    earliest_by_offset(LDTS) AS LDTS, 
    COLLECT_SET(NAMES) AS NAMES,
    earliest_by_offset(CUSTOMER_PK) AS CUSTOMER_PK
from TEST_1
group by CUSTOMERS_PK
emit changes;

and also here

create or replace stream TEST_STREAM (CUSTOMERS_ID VARCHAR KEY, LDTS BIGINT, NAMES ARRAY<VARCHAR>, CUSTOMER_PK VARCHAR)
WITH 
(KAFKA_TOPIC='TEST', KEY_FORMAT='KAFKA', VALUE_FORMAT='AVRO');

create or replace stream TEST_FINAL (KAFKA_KEY VARCHAR KEY, CUSTOMERS_ID VARCHAR, LDTS BIGINT,NAME VARCHAR, CUSTOMER_PK VARCHAR) WITH
(KAFKA_TOPIC='TEST_FINAL', VALUE_FORMAT='AVRO', partitions=10, replicas=1);

INSERT INTO 
    TEST_FINAL 
select
    CUSTOMERS_ID as KAFKA_KEY,
    AS_VALUE(CUSTOMERS_ID) as CUSTOMERS_ID,
    LDTS,
    NAMES[1] as NAME,
    CUSTOMER_PK
from TEST_STREAM
where
rowtime= LDTS and ARRAY_LENGTH(NAMES)=1;

You can ignore the logic of sql. These are just examples to make the question meatier. The point is that offset is obviously being lost during the container crash.

What else can I do ? Any properties I am missing ?
I am using kafka broker from confluent community v6.2.1 and ksqldb v0.21

Nikki
  • 404
  • 4
  • 14

1 Answers1

0

I guess Ill just answer my own question after all. Seems like consumer.isolation.level still needs to be set in ksql docker environment variables to "read_committed". Although everything kinda suggests processing_guarantee would do this for you, was not my case. Once I have added that, I still see uncommitted msgs in the topics but not in the ksql streams and tables anymore. Maybe this helps someone else

Nikki
  • 404
  • 4
  • 14