Tried below sql query using window tvf
Steps followed :
- Populated 1 million data into kafka topic within 2 minutes
- Reading data from kafka as table source with watermark startegy and ran below query using window tvf concept with 1 minute tumble window.
tableEnv.executeSql("CREATE TABLE cdrTable (\r\n"
+ " orgid STRING\r\n"
+ " ,clusterid STRING\r\n"
...
+ " ,rowtime TIMESTAMP(3) METADATA FROM 'timestamp'\r\n"
+ " ,proctime AS PROCTIME()\r\n"
+ " ,WATERMARK FOR rowtime AS rowtime - INTERVAL '1' SECOND\r\n"
+ " )\r\n"
+ " WITH (\r\n"
+ " 'connector' = 'kafka'\r\n"
+ " ,'topic' = 'cdr-direct'\r\n"
+ " ,'properties.bootstrap.servers' = 'localhost:9092'\r\n"
+ " ,'scan.startup.mode' = 'latest-offset'\r\n"
+ " ,'format' = 'json'\r\n"
+ " )");
String sql = "SELECT orgid, clusterid, ...
from (SELECT * FROM TABLE(TUMBLE(TABLE cdrTable, DESCRIPTOR(rowtime), INTERVAL '1' MINUTES)))
group by orgid, clusterid, ..., window_start, window_end";
Table order20 = tableEnv.sqlQuery(sql);
order20.executeInsert("outputCdrTable");
Facing issue with output/sink counts generated with above query it should be ideally 1 million but getting less counts as output(random counts for each run) for each run let say 10 to 20 percentage difference is observed.
Please help !!!