I have checkpointing
setup in my Flink job, and it has 2 sliding windows (these arent joins) and 1 tumbling window join. The idea is that I don't really need to save the state for the join
itself as saving the state for the 2
sliding windows itself is enough. The Join
ends up being a 20-30gb state causing the job to lag and crash, and the checkpoint never ends up saving.
How can I accomplish this?
I am trying something like:
public class CustomJoin implements JoinFunction<A, A, A>, ListCheckPointed<A> {
@Override
public A join(A a, A b){
// Some irrelevant join logic
}
@Override
public List<A> snapshotState(long l, long l1) throws Exception {
return new ArrayList<>();
}
@Override
public void restoreState(List<A> list) throws Exception {
}
}
Does this actually avoid storing state for join? Its called like:
stream
.assignTimestampsAndWatermarks(...)
.join(secondStream.assingTimestampsAndWatermarks(...))
.where(KeySelector...)
.equalTo(KeySelector...)
.window(TumblingEventTimeWindows.of(Time.minutes(1L))
.trigger(EventTimeTrigger.create())
.apply(new CustomJoin());
Does this work in practice? Whats the best way to avoid storing state?