2

In my create table ddl, i have set watermark on column and doing simple count(distinct userId) on a tumble window of 1 min, but stil not getting any data, same simple job is working fine in 1.13

CREATE TABLE  test (
                                                        eventName String,
                                                        ingestion_time BIGINT,
                                                        time_ltz AS TO_TIMESTAMP_LTZ(ingestion_time, 3),
    props ROW(userId VARCHAR, id VARCHAR, tourName VARCHAR, advertiserId VARCHAR, deviceId VARCHAR, tourId VARCHAR),
    WATERMARK FOR time_ltz AS time_ltz - INTERVAL '5' SECOND
    ) WITH (
          'connector' = 'kafka',
          'topic' = 'test',
          'scan.startup.mode' = 'latest-offset',
          'properties.bootstrap.servers' = 'localhost:9092',
          'properties.group.id' = 'local_test_flink_115',
          'format' = 'json',
          'json.ignore-parse-errors' = 'true',
          'scan.topic-partition-discovery.interval' = '60000'
          );

Also we have other jobs migrated but no data is matching with output. Is there any watermark default setting we need to set.

enter image description here

4 Answers4

1

I faced this issue and my operators that read from Kafka had lesser parallelism than the number of Kafka partitions. The issue for me was that one of the input streams in the stateful operator was empty. The input Kafka topic didn't have any records and removing this input seems to have solved the watermarks issue on the stateful operator.

If removing the idle stream is not an option for you, then you'll have to set a withIdleness on the WatermarkStrategy for the stream. See the Flink docs that explain this here.

sbrk
  • 1,338
  • 1
  • 17
  • 25
0

It could be an issue with watermark generation. Check in the Flink WebUI if there are watermarks in the "Watermarks" tab. Also if you have fixed this already please update, other people might have the same issue.

0

I was facing the same issue, watermark not being generated. I resolved the issue by decreasing the parallelism, it was more than the Kafka partitions.

  • You could also fix this by adding `.withIdleness(...)` to the WatermarkStrategy so that it will cope with the sources instances that don't have data. – David Anderson May 26 '23 at 20:33
0

Just adding my 2 cents on this question. I had this issue With FLink 1.17 consuming from 2 data sources from Kafka. This is the use case Watermark Strategies and the Kafka Connector.

I had to set the parallelism of the FLink data source as the same number of partitions from the Kafka topic. In my case they were different. Then I had one Flink data source with parallelism 48 and the second Flink data source with parallelism 7. By the way, I tried also a multiple of 7 for the second Flink data source (42), but didn't work out. I hope it can help someone.

Felipe
  • 7,013
  • 8
  • 44
  • 102