Using Flink RichSourceFunction
I am reading a file which has events in sorted order based on timestamp field. The file is very large in size, 500GB. I am reading this file sequentially using only one split (TimeStampedFileSplit
) for the whole file and partition count a 1. I am not using any watermarks or windowing for now. Then after reading this file I am performing a KeyBy operation on a different field and distributing the data across multiple partitions. So after distributing the data I notice that in some partitions the events are not sorted by timestamp. Why this can happen and how to make sure the events are always sorted ? Is there any theoretical explanation or proof for this ?
Asked
Active
Viewed 12 times
0

user3388770
- 119
- 1
- 11