Context
I have a Flink job coded by python SQL api. it is consuming source data from Kinesis and producing results to Kinesis. I want to make a local test to ensure the Flink application code is correct. So I mocked out both the source Kinesis and sink Kinesis with filesystem connector. And then run the test pipeline locally. Although the local flink job always run successfully. But when I look into the sink file. The sink file is alway empty. This has also been the case when I run the code in 'Flink SQL Client'.
Here is my code:
CREATE TABLE incoming_data (
requestId VARCHAR(4),
groupId VARCHAR(32),
userId VARCHAR(32),
requestStartTime VARCHAR(32),
processTime AS PROCTIME(),
requestTime AS TO_TIMESTAMP(SUBSTR(REPLACE(requestStartTime, 'T', ' '), 0, 23), 'yyyy-MM-dd HH:mm:ss.SSS'),
WATERMARK FOR requestTime AS requestTime - INTERVAL '5' SECOND
) WITH (
'connector' = 'filesystem',
'path' = '/path/to/test/json/file.json',
'format' = 'json',
'json.timestamp-format.standard' = 'ISO-8601'
)
CREATE TABLE user_latest_request (
groupId VARCHAR(32),
userId VARCHAR(32),
latestRequestTime TIMESTAMP
) WITH (
'connector' = 'filesystem',
'path' = '/path/to/sink',
'format' = 'csv'
)
INSERT INTO user_latest_request
SELECT groupId,
userId,
MAX(requestTime) as latestRequestTime
FROM incoming_data
GROUP BY TUMBLE(processTime, INTERVAL '1' SECOND), groupId, userId;
Curious what I am doing wrong here.
Note:
- I am using Flink 1.11.0
- if I directly dump data from source to sink without windowing and grouping, it works fine. That means the source and sink table is set up correctly. So it seems the problem is around the Tumbling and grouping for local filesystem.
- This code works fine with Kinesis source and sink.