3

how can i filter out duplicates over an infinite stream with a time window purge? I dont have infinite space / ram and i know that after say, 2 seconds (on the local clock), any duplicate that can occur will have occured. which means that after 2 seconds i can throw away (purge) the old data.

Filtering duplicates over an infinite stream with time window purge.

I got an excellent answer to how to remove the duplicates here in this question (big thanks to Till): apache flink 0.10 how to get the first occurence of a composite key from an unbounded input dataStream?

but i dont know how to tell flink to throw away the old data after 2 seconds (local time).

how can i do this with flink 0.10 please?

Thanks alot!!!

here is the statement which removes duplicates but doesnt purge:

input.keyBy(0, 1).flatMap(new DuplicateFilter()).print();

if I add .timeWindow(Time.minutes(1), Time.seconds(30)) after keyBy(0, 1) its not compilable.

Ricardo Alvaro Lohmann
  • 26,031
  • 7
  • 82
  • 82
timmy_stapler
  • 547
  • 7
  • 15

1 Answers1

4

Thanks to Till - the answer is given in an update to the following link: apache flink 0.10 how to get the first occurence of a composite key from an unbounded input dataStream?

see the update.

Community
  • 1
  • 1
timmy_stapler
  • 547
  • 7
  • 15