I would like a simpler better and more elegant way of approaching the below problem. I have yet to come across any documentation on the topic , and i am sure there my current approach has some bottle necks , thank you
I have a stream where Json is mapped to a POJO
DataStream<MYPOJO> stream = env.
addSource( <<kafkaSource>>).map(new EventToPOJO());
Some of the fields of the POJO will have a populated primary key and some will have a populated alternate-Key , Some will have both .The only example of working with two keys I have found in Flink document, is using a keyselector for a composite key but nothing for alternate keys
My current approach is as follows :
- Use a richFlatMapFunction to collect all elements of primary key into stream , Astream
- Use a richFlatMapFunction to collect all elements of alternate Key into a stream , BStream
- USe richFlatMap for items that have both primary and alternate keys, CStream
- Join the Astream with the Cstream on Primary Key
- Join the Bstream with the Cstream on Alternate Key
- finally KeyBy Primary Key
DataStream<MyPOJO> primaryKey = stream.flatMap(new RichFlatMapFunction<MyPOJO mypojo, MyPOJO mypojo>() {
@Override
public void flatMap(MyPOJO mypojo, Collector<MyPOJO> collector) throws Exception {
if(mypojo.PrimaryKey() != null){
collector.collect(MyPOJO);
}
}
});
DataStream<MyPOJO> alternateKey = stream.flatMap(new RichFlatMapFunction<MyPOJO mypojo, MyPOJO mypojo>() {
@Override
public void flatMap(MyPOJO mypojo, Collector<MyPOJO> collector) throws Exception {
if(mypojo.getAlternateKey() != null){
collector.collect(mypojo);
}
}
});
DataStream<MyPOJO> both = stream.flatMap(new RichFlatMapFunction<MyPOJO mypojo, MyPOJO mypojo>() {
@Override
public void flatMap(MyPOJO mypojo, Collector<MYPOJO> collector) throws Exception {
if(mypojo.getAlternateKey() != null && mypojo.getPrimaryKey() !=null ){
collector.collect(mypojo);
}
}
});
//Join them
both.join(alternateKey)
.where(MyPOJO::getAlternateKey)
.equalTo(MyPOJO::getAlternateKey)
.window(TumblingEventTimeWindows.of(Time.milliseconds(1)))
.apply (new JoinFunction<MyPOJO, MyPOJO, MyPOJO>(){
@Override
public StateObject join(MyPOJO Mypojo, MyPOJO mypojo2) throws Exception {
// Some Join logic to keep both states
return stateObject2;
}
});
:: repeat for primary key stream ...
// keyby at the end
both.keyBy(MyPOJO::getPrimaryKey)
I'm sure I could use a filter function As well to achieve the 3 streams , but I would like not to have to split into 3 streams in the first place, please not I have simplified the above for readability sake so please dont mind any syntax errors you may find.