Questions tagged [checkpointing]

105 questions
0
votes
1 answer

checkpointing in python to catch the runtime state

I have a problem to make my code more self-healable. Eg: I execute a method 1 to load the data from a CSV into the Vertica database. I have another method 2 to check if the number of rows in the database and the number of lines in CSV file is same.…
0
votes
2 answers

Spark Kinesis Streaming Checkpoint Recovery: RDD nullpointer exception

When resuming a failed job from a checkpoint application logic is invoked correctly and RDD's are reinstantiated, however a call to RDD.map results in a NullPointerException. lazy val ssc = StreamingContext.getOrCreate(checkpointDir,…
autodidacticon
  • 1,310
  • 2
  • 14
  • 33
0
votes
0 answers

Implementing Checkpointing in Spark Streaming Job submitted using Spark Job Server

Implementing checkpointing when spark streaming job is diretcly submitted to spark seems straight forward . We are a facing quite some complexities when we need to the same when the streaming job is submitted using Spark Job server..any…
0
votes
1 answer

TensorFlow train.Supervisor - save checkpoint upon training stop?

In TensorFlow 1.0, tf.train.Supervisor saves checkpoints at intervals of save_model_secs. Is there any way to save a checkpoint at the termination of training, rather than periodically during training?
Ron Cohen
  • 2,815
  • 5
  • 30
  • 45
0
votes
0 answers

Configuring AWS S3 Object expiration policy for Apache Spark Streaming checkpoint directory

Does anyone have experience with expiration policies on the S3 buckets hosting spark streaming checkpoints directories? I have setup an application using spark streaming + kafka and I want to use an S3 bucket with a 24 hour expiration policy set to…
0
votes
0 answers

Checkpointing of stateful operators in spark streaming

I am developing a streaming application which uses a mapwithstate function internally... I need to set the checkpointing interval of my checkpoinitnd data manually.. This is my sample code.. var newContextCreated = false // Flag to detect…
Mahdi
  • 787
  • 1
  • 8
  • 33
0
votes
1 answer

Checkpointing on Spark Node failures

I have developed a Spark Streaming application (which has internal sates as well) with check pointing and fault-tolerant..This is working on when I exit my application and re-reun it...so every thing (states..) loads well, I wonder why in the case…
Mahdi
  • 787
  • 1
  • 8
  • 33
0
votes
1 answer

Spark Checkpoint

I have created a RDD like follows: scala> val x = List(1,2,3,4) x: List[Int] = List(1, 2, 3, 4) scala> val y = sc.parallelize(ls,2) y: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[10] at parallelize at :29 scala> val z = y.map( c…
sraj
  • 9
  • 1
0
votes
1 answer

Check-pointing several filestreams in my spark streaming context

I have written a Spark Streaming application whihc need to do some chekpointing on various Dstream which have underlying transformation, as suggested in this thread (Error in starting Spark streaming context) , I have done all my transfomration in…
Mahdi
  • 787
  • 1
  • 8
  • 33
0
votes
0 answers

Trying to save dstream chepoints in a location on amazon s3

I want to save chekpoint tests in a location on amazon S3, this is the part of my scala code on DStream,using below format but getting the error.. Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key…
Mahdi
  • 787
  • 1
  • 8
  • 33
0
votes
1 answer

Accessing values from a restored Tensorflow variable

I have a simple recurrent network example, with a tf.Saver and weight, bias and state variables being saved. When the example is run with no options, it will initialise the state vector to contain zeros, but I want to pass a load_model option and it…
diffeomorphism
  • 991
  • 2
  • 10
  • 27
0
votes
0 answers

Function-level Checkpoint-Recovery

I am reading about checkpointing. Base on what I have read up now, there are 2 main checkpointing: System-level checkpointing (SLC) – core-dump style snapshots of computations Application-level checkpointing (ALC)– programs are self-checkpointing…
user2090491
  • 568
  • 2
  • 5
  • 15
0
votes
1 answer

docker suspend and resume using criu

I am building docker from this version of this source code: https://github.com/boucher/docker/tree/cr-combined after cloning the code : git clone -b cr-combined --single-branch https://github.com/boucher/docker.git cd docker #make build #make…
Walid Hanafy
  • 1,429
  • 2
  • 14
  • 26
0
votes
0 answers

Automatic Simple HTML DOM with Check Point

Good evening, I wanted to ask a question that I'm not sure it can be done. I have a scraping program made with Simple HTML DOM. The program extracts data from various websites. So my question is: Is it possible to automate the program to be able to…
Thane
  • 67
  • 3
0
votes
2 answers

h2o deeplearning checkpoint model

Folks, I have some problem when try resuming h2o deep learning in R from a checkpointed model with validation frame provided. It says "Validation dataset must be the same as for the check pointed model", which I believe I do have the same validation…
1 2 3 4 5 6
7