I am reading from 2 streams. One with record and One with metadata.
For the first time I want my application to build metadata by scanning complete table and save it to Flink's MapState. Updates on the table will be captured via metadata stream and MapState will be updated accordingly.
From second time, I want to use the MapState instead of reading entire table.
Below is my implementation of this functionality, but my MapState is always empty, am I doing something wrong here ?
public class CustomCoFlatMap extends RichCoFlatMapFunction<Record, Metadata, Output> {
private transient DataSource datasource;
private transient MapState<String, Metadata> metadataState;
@Inject
public void setDataSource(DataSource datasource) {
this.datasource = datasource;
}
@Override
public void open(Configuration parameters) throws Exception {
final RichFunctionComponent component = DaggerRichFunctionComponent.builder()
.richFunctionModule(RichFunctionModule.builder()
.runtimeContext(getRuntimeContext())
.build())
.build();
component.inject(this);
// read MapState from snapshot
metadataState = getRuntimeContext().getMapState(new MapStateDescriptor<String, Cluster>("metadataState",
TypeInformation.of(new TypeHint<String>(){}), TypeInformation.of(new TypeHint<Metadata>() {})));
}
@Override
public void flatMap2(Metadata metadata, Collector<Output> collector) throws Exception {
// this should happen only when application starts for first time
// from next time, application will read from snapshot
readMetadataForFirstTime();
// update metadata in MapState
this.metadataState.put(metadata.getId(), metadata);
}
@Override
public void flatMap1(Record record, Collector<Output> collector) throws Exception {
readMetadataForFirstTime();
Metadata metadata = this.metadataState.get(record.getId());
Output output = new Output(record.getId(), metadataState.getName(), metadataState.getVersion(), metadata.getType());
collector.collect(output);
}
private void readMetadataForFirstTime() throws Exception {
if(this.metadataState.iterator().hasNext()) {
// metadataState from snapshot has data
// not reading from table
return;
}
// do this only once
// read metadata from table and add it to MapState
List<Metadata> metadataList = datasource.listAllMetadata();
for(Metadata metadata: metadataList) {
this.metadataState.put(metadata.getid(), metadata);
}
}
}
EDIT: Rest of the applicaiton
DataStream<Metadata> metadataKeyedStream =
env.addSource(metadataStream)
.keyBy(Metadata::getId);
SingleOutputStreamOperator<Output> outputStream =
env.addSource(recordStream)
.assignTimestampsAndWatermarks(new RecordTimeExtractor())
.keyBy(Record::getId)
.connect(metadataKeyedStream)
.flatMap(new CustomCoFlatMap());