how can i filter out duplicates over an infinite stream with a time window purge? I dont have infinite space / ram and i know that after say, 2 seconds (on the local clock), any duplicate that can occur will have occured. which means that after 2 seconds i can throw away (purge) the old data.
Filtering duplicates over an infinite stream with time window purge.
I got an excellent answer to how to remove the duplicates here in this question (big thanks to Till): apache flink 0.10 how to get the first occurence of a composite key from an unbounded input dataStream?
but i dont know how to tell flink to throw away the old data after 2 seconds (local time).
how can i do this with flink 0.10 please?
Thanks alot!!!
here is the statement which removes duplicates but doesnt purge:
input.keyBy(0, 1).flatMap(new DuplicateFilter()).print();
if I add .timeWindow(Time.minutes(1), Time.seconds(30))
after keyBy(0, 1)
its not compilable.