0

In my Flinkcep application, I have a text file containing simple data (timestamp, type) as follow:

1,A
2,B
3,C
4,A
5,C
6,B
7,D
8,D
9,A
10,D

I can read this file and create an event stream from it, each event has a long field called "timestamp" and a string field called "type". But the problem is the generated event stream from this file is out of order. I checked both "print()" method and also write the event stream in a text file. the output is something like this :

9:A
1:A
10:D
5:C
3:C
2:B
7:D
6:B
4:A
8:D

my code is here:

public static void main(String[] args) throws Exception {



        // Set up the Flink execution environment
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

// Define the input data format
        TextInputFormat inputFormat = new TextInputFormat(new Path("/home/majidlotfian/flink/flink-quickstart/PLprivacy/input_folder/input.txt"));

// read the input data from a file
        DataStream<DataEvent> eventStream = env.readFile(inputFormat, "/home/majidlotfian/flink/flink-quickstart/PLprivacy/input_folder/input.txt")
                .map(new MapFunction<String, DataEvent>() {
                    @Override
                    public DataEvent map(String value) throws Exception {
                        // Parse the line into an event object
                        String[] fields = value.split(",");
                        long timestamp = Integer.parseInt(fields[0]);
                        String type = fields[1];
                        DataEvent event = new DataEvent(timestamp,type);
                        //event.setTimestamp(timestamp);
                        return event;
                    }
                })
                // Assign timestamps and watermarks
                .assignTimestampsAndWatermarks(new AssignerWithPeriodicWatermarks<DataEvent>() {
                    private long currentMaxTimestamp;
                    private final long maxOutOfOrderness = 10000; // 10 seconds

                    @Nullable
                    @Override
                    public Watermark getCurrentWatermark() {
                        return new Watermark(currentMaxTimestamp - maxOutOfOrderness);
                    }

                    @Override
                    public long extractTimestamp(DataEvent element, long previousElementTimestamp) {
                        long timestamp = element.getTimestamp();
                        currentMaxTimestamp = Math.max(currentMaxTimestamp, timestamp);
                        return timestamp;
                    }
                });

// partition the events by their timestamp field and group them into 5-second windows
        DataStream<DataEvent> windowedEvents = eventStream
                .keyBy("timestamp")
                .window(TumblingEventTimeWindows.of(Time.seconds(5)))
                .process(new ProcessWindowFunction<DataEvent, DataEvent, Tuple, TimeWindow>() {
                    @Override
                    public void process(Tuple key, Context context, Iterable<DataEvent> elements, Collector<DataEvent> out) throws Exception {
                        // Sort the events within the window based on their timestamp field
                        List<DataEvent> events = new ArrayList<>();
                        for (DataEvent event : elements) {
                            events.add(event);
                        }
                        Collections.sort(events, new Comparator<DataEvent>() {
                            @Override
                            public int compare(DataEvent event1, DataEvent event2) {
                                return Long.compare(event1.getTimestamp(), event2.getTimestamp());
                            }
                        });
                        for (DataEvent event : events) {
                            out.collect(event);
                        }
                    }
                });

// print the windowed event stream
        windowedEvents.print();

// write the windowed events to a text file
        String outputPath = "/home/majidlotfian/flink/flink-quickstart/PLprivacy/output_folder/output.txt";
        windowedEvents.map(new MapFunction<DataEvent, String>() {
                    @Override
                    public String map(DataEvent value) throws Exception {
                        return value.getTimestamp()+":"+value.getType();
                    }
                })
                .writeAsText(outputPath, FileSystem.WriteMode.OVERWRITE)
                .setParallelism(1);  // ensure that events are written in order


        env.execute("EventStreamCEP");
  }
}

My question is how can I correct the out of order events? I this a problem with reading from a file?

I tried Assign timestamps and watermarks but it did not work.

2 Answers2

0

There is an example of how to receive values from the map sorted by keys:

    public static void main(String[] args) {
        HashMap<Integer, Character> map = new HashMap<>();
        
        map.put(1, 'A');
        map.put(2, 'B');
        map.put(3, 'C');
        map.put(4, 'A');
        map.put(5, 'C');
        map.put(6, 'B');
        map.put(7, 'D');
        map.put(8, 'D');
        map.put(9, 'A');
        map.put(10, 'D');

        System.out.println("Sorted by keySet stream:");
        map.keySet().stream()
                .sorted()
                .forEach(key -> System.out.println(map.get(key)));
        
        System.out.println("\nSorted by entrySet stream:");
        map.entrySet().stream()
                .sorted(Comparator.comparingInt(Map.Entry::getKey)) // < after this line your map will be sorted by key
                                                                    // < and you can follow your logic with this sorted data
                .forEach(System.out::println);

        List<Character> sortedByKeys = map.keySet().stream()
                .sorted()
                .map(map::get)
                .collect(Collectors.toList());
        
        System.out.println("\nsorted list:");
        sortedByKeys.forEach(System.out::println);
    }
Valentyn Hruzytskyi
  • 1,772
  • 5
  • 27
  • 59
0

You are sorting within each window, but in such a way that each window only has events for the same timestamp, so the sorting has no effect. The problem is caused by keyBy("timestamp").

If you want to do a global sort across all of the data, then don't use keyBy, and use

.windowAll(TumblingEventTimeWindows.of(Time.seconds(5)))

If you want to sort independently for each type, then key by the type (rather than the timestamp), and continue using window (rather than windowAll).

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • weird thing is now with applying your advice, the stream is ordered and I can detect pattern over it, but the print function over the event stream and also writing the event stream in the file give me out of order streams. – Majid Lotfian Delouee Mar 24 '23 at 15:29