As was discussed before:
Whenever multiple streams are connected, the resulting watermark is
the minimum of the incoming watermarks
and
which means flink was buffering data from stream1 while waiting for
stream2 and would eventually result in High Backpressure and finally a
OOM.
It works for coGroup()
method from the DataStream<T>
class which returns CoGroupedStreams<T, T2>
.
To avoid such behavior we can use union(DataStream<T>... streams)
method which returns a simple DataStream<T>
where the watermarks will be advancing as in a usual stream.
The only problem which we need to solve is to have a common schema (class) for both streams. We can use some aggregation class with two fields:
public class Aggregator {
private FlatHighFrequencyAnalog flatHighFrequencyAnalog;
private EVWindow evWindow;
public Aggregator(FlatHighFrequencyAnalog flatHighFrequencyAnalog) {
this.flatHighFrequencyAnalog = flatHighFrequencyAnalog;
}
public Aggregator(EVWindow evWindow) {
this.evWindow = evWindow;
}
public FlatHighFrequencyAnalog getFlatHighFrequencyAnalog() {
return flatHighFrequencyAnalog;
}
public EVWindow getEVWindow() {
return evWindow;
}
}
Also, a more generic way is to use Either<L, R>
class from org.apache.flink.types
.
Let's summarize what we'll have in the end:
SingleOutputStreamOperator<Either<EVWindow, FlatHighFrequencyAnalog>> stream1 =
environment
.addSource(createHFAConsumer())
.map(hfa -> Either.Left(hfa));
SingleOutputStreamOperator<Either<EVWindow, FlatHighFrequencyAnalog>> stream2 =
environment
.addSource(createHFDConsumer())
.map(hfd -> Either.Right(hfd));
DataStream<Message> pStream =
stream1
.union(stream2)
.assignTimestampsAndWatermarks(
WatermarkStrategy
.<Either<EVWindow, FlatHighFrequencyAnalog>>forBoundedOutOfOrderness(
ofSeconds(MAX_OUT_OF_ORDERNESS))
.withTimestampAssigner((input, timestamp) -> input.isLeft() ? input.left().getTimeStamp() : input.right().getTimeStamp()))
.keyBy(value -> value.isLeft() ? value.left().getId() : value.right().getId())
.window(TumblingEventTimeWindows.of(Time.minutes(MINUTES)))
.process(new ProcessWindowFunction());
To get different collections in the process function
List<EVWindow> evWindows =
Streams.stream(elements)
.filter(Either::isLeft)
.map(Either::left)
.collect(Collectors.toList());
List<FlatHighFrequencyAnalog> highFrequencyAnalogs =
Streams.stream(elements)
.filter(Either::isRight)
.map(Either::right)
.collect(Collectors.toList());