Below is a flink program (Java) which reads tweets from a file, extract hash tags, count the number of repetition for each hash tag and finally write in a file.
Now In this program there is a sliding Window of size 20 seconds that slides by 5 seconds. In sink all output data is getting written into file named outfile. Means after every 5 seconds one window is getting fired and writing data into outfile.
My Problem:
I want that for every window firing (means in every 5 seconds) data gets written in new file. (instead of getting appended in same file). Kindly guide where and how it can be done? Do i need to use custom trigger or any configuration regarding sink? or anything else?
Code:
<!-- language: lang-java -->
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.getConfig().setAutoWatermarkInterval(100);
env.enableCheckpointing(5000,CheckpointingMode.EXACTLY_ONCE);
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(5000);
String path = "C:\\Users\\eventTime";
// Reading data from files of folder eventTime.
DataStream<String> streamSource = env.readFile(new TextInputFormat(new Path(path)), path, FileProcessingMode.PROCESS_CONTINUOUSLY, 1000).uid("read-1");
//Extracting the hash tags of tweets
DataStream<Tuple3<String, Integer, Long>> mapStream = streamSource.map(new ExtractHashTagFunction());
//generating watermarks and extracting the timestamps from tweets
DataStream<Tuple3<String, Integer, Long>> withTimestampsAndWatermarks = mapStream.assignTimestampsAndWatermarks(new MyTimestampsAndWatermarks());
KeyedStream<Tuple3<String, Integer, Long>,Tuple> keyedStream = withTimestampsAndWatermarks.keyBy(0);
//Using sliding window of 20 seconds which slide by 5 seconds.
SingleOutputStreamOperator<Tuple4<String, Integer, Long, String>> aggregatedStream = keyedStream.**window(SlidingEventTimeWindows.of(Time.seconds(20),Time.seconds(5)))**
.aggregate(new AggregateHashTagCountFunction()).uid("agg-123");
aggregatedStream.writeAsText("C:\\Users\\outfile", WriteMode.NO_OVERWRITE).setParallelism(1).uid("write-1");
env.execute("twitter-analytics");