Questions tagged [checkpointing]
105 questions
0
votes
1 answer
checkpointing in python to catch the runtime state
I have a problem to make my code more self-healable. Eg: I execute a method 1 to load the data from a CSV into the Vertica database. I have another method 2 to check if the number of rows in the database and the number of lines in CSV file is same.…

namratha Mk
- 23
- 5
0
votes
2 answers
Spark Kinesis Streaming Checkpoint Recovery: RDD nullpointer exception
When resuming a failed job from a checkpoint application logic is invoked correctly and RDD's are reinstantiated, however a call to RDD.map results in a NullPointerException.
lazy val ssc = StreamingContext.getOrCreate(checkpointDir,…

autodidacticon
- 1,310
- 2
- 14
- 33
0
votes
0 answers
Implementing Checkpointing in Spark Streaming Job submitted using Spark Job Server
Implementing checkpointing when spark streaming job is diretcly submitted to spark seems straight forward . We are a facing quite some complexities when we need to the same when the streaming job is submitted using Spark Job server..any…
0
votes
1 answer
TensorFlow train.Supervisor - save checkpoint upon training stop?
In TensorFlow 1.0, tf.train.Supervisor saves checkpoints at intervals of save_model_secs. Is there any way to save a checkpoint at the termination of training, rather than periodically during training?

Ron Cohen
- 2,815
- 5
- 30
- 45
0
votes
0 answers
Configuring AWS S3 Object expiration policy for Apache Spark Streaming checkpoint directory
Does anyone have experience with expiration policies on the S3 buckets hosting spark streaming checkpoints directories? I have setup an application using spark streaming + kafka and I want to use an S3 bucket with a 24 hour expiration policy set to…
0
votes
0 answers
Checkpointing of stateful operators in spark streaming
I am developing a streaming application which uses a mapwithstate function internally...
I need to set the checkpointing interval of my checkpoinitnd data manually..
This is my sample code..
var newContextCreated = false // Flag to detect…

Mahdi
- 787
- 1
- 8
- 33
0
votes
1 answer
Checkpointing on Spark Node failures
I have developed a Spark Streaming application (which has internal sates as well) with check pointing and fault-tolerant..This is working on when I exit my application and re-reun it...so every thing (states..) loads well,
I wonder why in the case…

Mahdi
- 787
- 1
- 8
- 33
0
votes
1 answer
Spark Checkpoint
I have created a RDD like follows:
scala> val x = List(1,2,3,4)
x: List[Int] = List(1, 2, 3, 4)
scala> val y = sc.parallelize(ls,2)
y: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[10] at parallelize at :29
scala> val z = y.map( c…

sraj
- 9
- 1
0
votes
1 answer
Check-pointing several filestreams in my spark streaming context
I have written a Spark Streaming application whihc need to do some chekpointing on various Dstream which have underlying transformation, as suggested in this thread (Error in starting Spark streaming context) , I have done all my transfomration in…

Mahdi
- 787
- 1
- 8
- 33
0
votes
0 answers
Trying to save dstream chepoints in a location on amazon s3
I want to save chekpoint tests in a location on amazon S3, this is the part of my scala code on DStream,using below format but getting the error..
Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key…

Mahdi
- 787
- 1
- 8
- 33
0
votes
1 answer
Accessing values from a restored Tensorflow variable
I have a simple recurrent network example, with a tf.Saver and weight, bias and state variables being saved.
When the example is run with no options, it will initialise the state vector to contain zeros, but I want to pass a load_model option and it…

diffeomorphism
- 991
- 2
- 10
- 27
0
votes
0 answers
Function-level Checkpoint-Recovery
I am reading about checkpointing. Base on what I have read up now, there are 2 main checkpointing:
System-level checkpointing (SLC)
– core-dump style snapshots of computations
Application-level checkpointing (ALC)– programs are self-checkpointing…

user2090491
- 568
- 2
- 5
- 15
0
votes
1 answer
docker suspend and resume using criu
I am building docker from this version of this source code:
https://github.com/boucher/docker/tree/cr-combined
after cloning the code :
git clone -b cr-combined --single-branch https://github.com/boucher/docker.git
cd docker
#make build
#make…

Walid Hanafy
- 1,429
- 2
- 14
- 26
0
votes
0 answers
Automatic Simple HTML DOM with Check Point
Good evening,
I wanted to ask a question that I'm not sure it can be done.
I have a scraping program made with Simple HTML DOM.
The program extracts data from various websites.
So my question is:
Is it possible to automate the program to be able to…

Thane
- 67
- 3
0
votes
2 answers
h2o deeplearning checkpoint model
Folks,
I have some problem when try resuming h2o deep learning in R from a checkpointed model with validation frame provided. It says "Validation dataset must be the same as for the check pointed model", which I believe I do have the same validation…

russiancube
- 11
- 3