0

I use Flink to enrich a flow of inputs

case class Input( key: String, message: String )

with precomputed scores

case class Score( key: String, score: Int )

and produce an output

case class Output( key: String, message: String, score: Int )

Both the input and score streams are read from Kafka topics and the resulting output stream is published to Kafka too

val processed = inputStream.connect( scoreStream )
                           .flatMap( new ScoreEnrichmentFunction )
                           .addSink( producer )

with the following ScoreEnrichmentFunction:

class ScoreEnrichmentFunction extends RichCoFlatMapFunction[Input, Score, Output]
{
    val scoreStateDescriptor = new ValueStateDescriptor[Score]( "saved scores", classOf[Score] )
    lazy val scoreState: ValueState[Score] = getRuntimeContext.getState( scoreStateDescriptor )

    override def flatMap1( input: Input, out: Collector[Output] ): Unit = 
    {
        Option( scoreState.value ) match {
            case None => out.collect( Output( input.key, input.message, -1 ) )
            case Some( score ) => out.collect( Output( input.key, input.message, score.score ) )  
        }
    }

    override def flatMap2( score: Score, out: Collector[Output] ): Unit = 
    {
        scoreState.update( score )
    } 
}

This works well. However, if I take a safe point and cancel the Flink job, the scores stored in the ValueState are lost when I resume the job from the save point.

As I understand, it seems that ScoreEnrichmentFunction needs to be extended with a CheckPointedFunction

class ScoreEnrichmentFunction extends RichCoFlatMapFunction[Input, Score, Output] with CheckpointedFunction

but I struggle to understand how to implement the methods snapshotState and initializeState to work with a keyed state

override def snapshotState( context: FunctionSnapshotContext ): Unit = ???


override def initializeState( context: FunctionInitializationContext ): Unit = ???

Note that I use the following env:

val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism( 2 )
    env.setBufferTimeout( 1 )
    env.enableCheckpointing( 1000 )
    env.getCheckpointConfig.enableExternalizedCheckpoints( ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION )
    env.getCheckpointConfig.setCheckpointingMode( CheckpointingMode.EXACTLY_ONCE )
    env.getCheckpointConfig.setMinPauseBetweenCheckpoints( 500 )
    env.getCheckpointConfig.setCheckpointTimeout( 60000 )
    env.getCheckpointConfig.setFailOnCheckpointingErrors( false )
    env.getCheckpointConfig.setMaxConcurrentCheckpoints( 1 )
david
  • 309
  • 1
  • 4
  • 15
  • This looks like it should work. FYI, snapshotState and initializeState are for non-keyed state, and aren't used with keyed state (I can't see that you are keying the streams, but I assume you are doing that in code you haven't shared). How are you doing the restart with the savepoint, and how are you determining that the state isn't being restored? – David Anderson Sep 28 '18 at 10:06
  • Also: are you trying to resume from a savepoint, or from an externalized checkpoint? – David Anderson Sep 28 '18 at 10:10
  • Indeed, scoreStream and inputStream are keyed. In order to check that the state is loaded, I check the value of Output.score in the output stream (output Kafka topic). If it is different from -1 I know the scores have been correctly loaded and the enrichment is OK. – david Sep 28 '18 at 10:30
  • I proceed as follows: I start the job with "bin/flink run myjar.jar", I send the scores to kafka (score topic), then I send the inputs (input topic) and I check that the output is OK (output topic). Then I cancel the job with "bin/flink cancel -s [:targetDirectory] :jobId" and I restore it with "./bin/flink run myjar.jar -s my-save-point-path". At that point I send a new series of inputs on the input topic and I check the output topic. – david Sep 28 '18 at 10:30
  • Which state backend are you using? – David Anderson Sep 28 '18 at 11:03
  • val backend = new FsStateBackend( "file:///data", true ); env.setStateBackend( backend ) – david Sep 28 '18 at 11:10
  • I am experimenting with the FsStateBackend. Eventually, I would like to use RocksDB. – david Sep 28 '18 at 11:12
  • What version of Flink are you using? – David Anderson Sep 28 '18 at 12:50
  • Apache Flink 1.6.0 – david Sep 28 '18 at 14:38
  • Hey David Anderson so for keyed state should I use ListCheckpointed? Currently I saving it with HDFS and got chk-3 with actual data writen in it, but when i restart the program the state i saved doent seems to be reinitialized – Ricc Feb 01 '19 at 04:13

1 Answers1

0

I think I found the problem. I was trying to use separate directories for the checkpoints and the savepoints, which resulted in having the savepoint directory and the FsStateBackend directory to be different.

Using the same directory in

val backend = new FsStateBackend( "file:/data", true )
env.setStateBackend( backend )

and when taking a savepoint

bin/flink cancel d75f4712346cadb4df90ec06ef257636 -s file:/data

solves the problem.

david
  • 309
  • 1
  • 4
  • 15
  • this can only be done with command-line or you can do this on IDE too? I tried but the initialstate still doesnt pick up any data to the ListState – Ricc Feb 01 '19 at 04:21