0

I am developing a streaming application which uses a mapwithstate function internally...

I need to set the checkpointing interval of my checkpoinitnd data manually..

This is my sample code..

var newContextCreated = false      // Flag to detect whether new context  was created or not

// Function to create a new StreamingContext and set it up
def creatingFunc(): StreamingContext = {

  // Create a StreamingContext
val ssc = new StreamingContext(sc, Seconds(batchIntervalSeconds))

  // Create a stream that generates 1000 lines per second
  val stream = ssc.receiverStream(new DummySource(eventsPerSecond))

  // Split the lines into words, and create a paired (key-value) dstream
  val wordStream = stream.flatMap { _.split(" ")  }.map(word => (word, 1))

   // This represents the emitted stream from the trackStateFunc. Since we    emit every input record with the updated value,
  // this stream will contain the same # of records as the input dstream.
  val wordCountStateStream = wordStream.mapWithState(stateSpec)
 wordCountStateStream.print()

  // A snapshot of the state for the current batch. This dstream contains one entry per key.
  val stateSnapshotStream = wordCountStateStream.stateSnapshots()  
 stateSnapshotStream.foreachRDD { rdd =>
rdd.toDF("word", "count").registerTempTable("batch_word_count")
}

   ssc.remember(Minutes(1))  // To make sure data is not deleted by the time     we query it interactively

 ssc.checkpoint("dbfs:/streaming/trackstate/100")

 println("Creating function called to create new StreamingContext")
     newContextCreated = true  
     ssc 
   }
Mahdi
  • 787
  • 1
  • 8
  • 33
  • What is your question? How to set the checkpoint interval? – Yuval Itzchakov Sep 02 '16 at 07:19
  • Yes, I wnat to set checkpoint interval for the underlying stream holding my states..Is there any stream which handle states then I can access it and mange the its checkpointng..? when I wanna checkpont a Dstream I use **Dstream.chekpoint(interval); Dstream.foreachRDD(_.count)**..wanna do a similar thing on the internal stream holding states.. – Mahdi Sep 02 '16 at 10:06
  • You want each state to contain it's own checkpoint interval? – Yuval Itzchakov Sep 02 '16 at 12:11
  • yes, I want to set my Dstream state interval check-pointing which could be different with the Dstream checkpoint interval.. – Mahdi Sep 03 '16 at 11:56
  • can you answer the question? how to access the underlying stream which is used for check-pointing in mapwithstae function? I want to change the interval of its checkpoint.. – Mahdi Sep 05 '16 at 10:51
  • 1
    AFAIK, you can't have a different interval per stream. – Yuval Itzchakov Sep 05 '16 at 11:59
  • HI, I think there is a global state after applying mapwithstate function..this sate however is updated for each RDD comming to the Dstream..I wanted to know if these sates are saved in any Dstream ?..then can I access that Dstream which keeping states.. – Mahdi Sep 07 '16 at 02:29

0 Answers0