I would like to explain my problem statement by explaing below scenario first.
Scenario : I am working on continuos file reading using flink's PROCESS_CONTINOUS mode using flink+java8.
This is actually a batch reading kind of functionality in which different files will received at diffeent timings in a day. So let say file_1.csv arrives at 3:00 PM then my flink job would read this file. Again file-2.csv arrives at 3:30PM then flink job will read this file as well and the process will continue working in this way till job stops. We sink these data to Kafka.
Problem : When i restart the flink job then it start reading all the earlier read files' data.Which means i am getting same records again and again as i restart the job.
Is there any way of preventing data duplicacy?