Highest Voted 'checkpointing' Questions

3

votes

1 answer

pytorch torch.load load_checkpoint and learning_rate

Following this medium post, I understand how to save and load my model (or at least I think I do). They say the learning_rate is saved. However, looking at this person's code (it's a github repo with lots of people watching, forking, etc. so I'm…

python pytorch checkpointing

asked Mar 31 '22 at 13:52

FluidMechanics Potential Flows

594
10
23

3

votes

1 answer

Azure Event Hubs Streaming: Does Checkpointing override setStartingPosition?

If we specify the starting position in EventHub conf like so: EventHubsConf(ConnectionStringBuilder(eventHubConnectionString).build) .setStartingPosition(EventPosition.fromStartOfStream) or …

spark-structured-streaming azure-eventhub event-driven-design checkpointing

asked Feb 18 '21 at 23:57

Gadam

2,674
8
37
56

3

votes

2 answers

TF Keras ModelCheckpoint filepath batch number

I am using ModelCheckpoint to save checkpoints every 500 batches in every epoch. It is documented here https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint. How would I set filepath to include the batch number? I know I can…

tensorflow keras checkpointing

asked Sep 20 '19 at 00:26

rishai

495
3
15

3

votes

0 answers

Spark Structured Streaming: Error reading delta file with hdfs checkpoint location

I want to run a Spark Structured Streaming job locally on a single machine. Unfortunately, recovering from an aborted job does not work when the job was aborted while processing data (it fails with the log shown below). (If the streaming job is…

spark-structured-streaming checkpointing

asked Jul 08 '19 at 11:17

Esparko

31
1

3

votes

0 answers

Is there a way to export/checkpoint OpenCV Background Subtraction for later use?

Is there a way to export/checkpoint OpenCV Background Subtraction for later use? I have some very long video files to process which require background removal. I would like to cut the video into small chunks and process each chunk separately. …

python opencv computer-vision background-subtraction checkpointing

asked Apr 09 '19 at 20:23

WesH

460
5
15

3

votes

1 answer

MapWithState gives java.lang.ClassCastException: org.apache.spark.util.SerializableConfiguration cannot be cast while recovering from checkpoint

I am facing an issue with spark streaming job where i am trying to use broadcast, mapWithState and checkpointing together in spark. Following is the usage: Since I have to pass some connection object (which is not Serializable) to the executors, I…

apache-spark serialization spark-streaming broadcast checkpointing

asked Aug 01 '17 at 16:54

Saman

53
6

3

votes

2 answers

Is checkpointing necessary in spark streaming

I have noticed that spark streaming examples also have code for checkpointing. My question is how important is that checkpointing. If its there for fault tolerance, how often do faults happen in such streaming applications?

scala apache-spark checkpointing

asked Sep 20 '16 at 16:59

pythonic

20,589
43
136
219

3

votes

2 answers

h2o deeplearning checkpoint

I'm trying to run h2o.deeplearning twice, using checkpoint parameter on 2 train sets (using same parameters except different epochs). I'm getting the following error: Error: 'The columns of the training data must be the same as for the checkpointed…

machine-learning neural-network deep-learning h2o checkpointing

asked Dec 31 '15 at 15:34

eli

81
5

3

votes

1 answer

Apache Spark - accessing internal data on RDDs?

I started doing the amp-camp 5 exercises. I tried the following 2 scenarios: Scenario #1 val pagecounts = sc.textFile("data/pagecounts") pagecounts.checkpoint pagecounts.count Scenario #2 val pagecounts =…

apache-spark rdd checkpointing

asked Sep 30 '15 at 08:35

Jatin Ganhotra

6,825
6
48
71

2

votes

2 answers

Transparently replace file mapping with anonymous

I am doing a checkpoint-and restore using CRIU; in turn after restore, my application wakes with some threads that have their stack mmaped into files on disk (CRIU doesn't do it by default, this is a custom optimization). Later on, I want to…

pthreads mmap ptrace checkpointing criu

asked Apr 05 '23 at 09:51

Radim Vansa

5,686
2
25
40

2

votes

1 answer

How to restore a specific checkpoint in tensorflow2 (to implement early stopping)?

I used the following code to create a checkpoint manager outside of the loop that I train my model: checkpoint_path = "./checkpoints/train" ckpt = tf.train.Checkpoint(object_1=object_1) ckpt_manager = tf.train.CheckpointManager(ckpt,…

python tensorflow tensorflow2.0 checkpointing

asked Jul 15 '20 at 16:11

khemedi

774
3
9
19

2

votes

0 answers

snakemake checkpoint calling variable not defined

I have the below snakefile with checkpoints. I am trying to run this for 2 samples (defined as RUNS). However, everytime I try I'm getting an additional variable included. Any thoughts on how to resolve this? Thank you.. import os from tempfile…

python snakemake checkpointing

asked May 07 '20 at 17:51

Susheel Busi

163
8

2

votes

0 answers

Apache beam job on Flink checkpoint size growing over time

One of our Apache beam job running through the FlinkRunner is experiencing an odd behavior with checkpoint size. The state backend is Filebased. The job receives traffic once a day for a period of an hour and then is idle until it receives more…

apache-flink apache-beam checkpointing

asked Apr 21 '20 at 14:51

TheFlyingFox

31
4

2

votes

1 answer

RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at torch/csrc/cuda/Module.cpp:51

When I try to load a pytorch checkpoint: checkpoint = torch.load(pathname) I see: RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at torch/csrc/cuda/Module.cpp:51 I created the checkpoint with…

python pytorch checkpointing

asked Apr 19 '19 at 09:05

Tom Hale

40,825
36
187
242

2

votes

1 answer

How to set the setCheckpoint in pyspark

I don't know much spark. On the top of the code I have from pysaprk.sql import SparkSession import pyspark.sql.function as f spark = SparkSession.bulder.appName(‘abc’).getOrCreate() H = sqlContext.read.parquet(‘path to hdfs file’) H has about 30…

apache-spark-sql checkpointing

asked Feb 17 '19 at 04:42

pmjn6

307
1
4
14

Questions tagged [checkpointing]