The youtube video and github repo I linked to in this answer cover a number of similar scenarios. But the best way to bootstrap Flink state is to preload the data into a savepoint using the State Processor API.
Keep in mind that Flink's MapState
is a kind of key-partitioned state. So if you use MapState<Metadata::Id, Metadata>
, that is effectively a Map<KEY, MapState<Metadata::Id, Metadata>>
that is sharded across the cluster by KEY.
Here's an example showing how to create a savepoint containing a ValueState<Integer>
:
public class Bootstrap {
public static void main( String[] args ) throws Exception {
ExecutionEnvironment bEnv =
ExecutionEnvironment.getExecutionEnvironment();
BootstrapTransformation<Integer> transform =
OperatorTransformation.bootstrapWith(bEnv.fromElements(1, 2, 3))
.keyBy(String::valueOf)
.transform(new SimplestTransform());
Savepoint
.create(new FsStateBackend("file:///tmp/checkpoints"), 256)
.withOperator("my-operator-uid", transform)
.write("file:///tmp/savepoints/");
bEnv.execute();
}
static public class SimplestTransform
extends KeyedStateBootstrapFunction<String, Integer> {
ValueState<Integer> state;
@Override
public void open(Configuration parameters) {
ValueStateDescriptor<Integer> descriptor = new
ValueStateDescriptor<>("total", Types.INT);
state = getRuntimeContext().getState(descriptor);
}
@Override
public void processElement(Integer value, Context ctx) throws Exception {
state.update(value);
}
}
}
This creates a sharded key/value map containing {"1": 1, "2": 2, "3": 3}
.