Highest Voted 'checkpointing' Questions

2

votes

1 answer

Checkpoint in Declarative Jenkins Pipeline

I am looking at Cloudbees documentation that says : The correct approach is to always keep the checkpoint step outside of any node block, not associated with either an agent or a workspace The sample example given is for a scripted pipeline. I…

asked Aug 08 '18 at 18:44

Ram

173
2
11

2

votes

2 answers

Where is the default checkpoint(s) kept in Apache Flink?

I am a newbie to Apache Flink, and I was going through the Apache Flink's examples. I found that in case of a failure Flink has the ability to restore stream processing from a checkpoint. StreamExecutionEnvironment env =…

java apache-flink flink-streaming checkpointing

asked May 13 '18 at 15:46

himanshuIIITian

5,985
6
50
70

2

votes

0 answers

Reconnecting to MPI after Linux process state is restored

Storytime! Consider the following scenario: Using Hydra, MPICH spawns 2 different processes (Simulators). Call them Apple and Orange! Apple and Orange start, they load a dynamically linked library, and they use that library to call MPI_Init and do…

mpi mpich checkpointing

asked Mar 28 '18 at 17:34

MehMastah

21
3

2

votes

0 answers

TensorFlow Checkpoints for Online Learning

I am trying to build an adaptable speech Recognition system based on Mozilla DeepSpeech (which is TensorFlow implementation of the DeepSpeech paper) The idea is that, We will pretrain a model on a certain voice. Then, save the model + create a…

tensorflow deep-learning checkpointing

asked Feb 20 '18 at 13:26

Gopala Krishna Char

212
1
6

2

votes

1 answer

S3 Checkpoint with Structured Streaming

I have tried the suggestions given in the Apache Spark (Structured Streaming) : S3 Checkpoint support I am still facing this issue. Below is the error i get 17/07/06 17:04:56 WARN FileSystem: "s3n" is a deprecated filesystem name. Use…

java apache-spark amazon-s3 spark-structured-streaming checkpointing

asked Jul 07 '17 at 14:30

fledgling

991
4
25
48

2

votes

0 answers

Failure to reload from checkpoint directory

When I tried reloading my spark streaming application from a checkpoint directory, I got the following exception: java.lang.IllegalArgumentException: requirement failed: Checkpoint directory does not exist:…

spark-streaming reload illegalargumentexception checkpointing

asked Sep 29 '16 at 04:09

mahdi62

959
2
11
17

2

votes

0 answers

Read Spark Streaming checkpoint data

I'm writing a Spark Streaming application reading from Kafka. In order to have an exactly one semantic, I'd like to use the direct Kafka stream and using Spark Streaming native checkpointing. The problem is that checkpointing makes pratically…

apache-kafka spark-streaming checkpointing

asked Sep 16 '16 at 08:11

mgaido

2,987
3
17
39

2

votes

1 answer

Variable scopes in Tensorflow

I am having problems making effective usage of variable scopes. I want to define some variables for weights, biases and inner state of a simple recurrent network. I call get_saver() once after defining the default graph. I then iterate over a batch…

python-2.7 tensorflow checkpointing

asked Jun 07 '16 at 13:03

diffeomorphism

991
2
10
27

2

votes

1 answer

What does checkpointing do on Apache Spark?

What does checkpointing do for Apache Spark, and does it take any hits on RAM or CPU?

hadoop apache-spark pyspark checkpointing

asked Apr 14 '16 at 19:34

cshin9

1,440
5
20
33

1

vote

0 answers

SSIS checkpoints are not re-starting correctly, skipping NON-checkpointed tasks

I have an SSIS package where the checkpoints are not behaving as I understand that they should. To simplify, this is the kind of setup: Imagine a package with two containers in a serial flow (Container 1 executes then Container 2). Checkpoints are…

ssis sql-server-2016 checkpointing

asked Jan 23 '23 at 17:48

Lee Cascio

11
2

1

vote

1 answer

How to configure checkpointing on an XTDB node using AWS S3

I am using XTDB 1.21.0 deployed on AWS/ECS (Fargate) with checkpoints configured (frequency 30 minutes) and stored on an S3 bucket (RocksDB). After a couple of successful checkpoints, they seem to be constantly failing with an XTDB warning due to an…

database amazon-s3 clojure checkpointing xtdb

asked Jun 27 '22 at 14:04

modality

21
2

1

vote

0 answers

Flink AT_LEAST_ONCE checkpoint uses 100% managed memory

We have a Flink streaming job v1.14 running in native K8S deployment mode. When we use AT_LEAST_ONCE checkpoint mode, the managed memory usage hits 100% no matter how many memory we assigned to it. Any ideas what might be the cause or is this…

apache-flink flink-streaming checkpointing

asked Nov 23 '21 at 17:25

周天钜

33
1
4

1

vote

1 answer

Flink checkpointing working for ProcessFunction but not for AsyncFunction

I have operator checkpointing enabled and working smoothly for a ProcessFunction operator. On job failure I can see how operator state gets externalized on the snapshotState() hook, and on resume, I can see how state is restored at the…

apache-flink checkpointing

asked Nov 15 '21 at 16:39

diegoruizbarbero

101
10

1

vote

1 answer

Apache Flink to use S3 for backend state and checkpoints

Background I was planning to use S3 to store the Flink's checkpoints using the FsStateBackend. But somehow I was getting the following error. Error org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system…

amazon-s3 apache-flink flink-streaming checkpoint checkpointing

asked Oct 06 '20 at 13:15

Keshav Lodhi

2,641
2
17
23

1

vote

0 answers

Apache Flink losing records when task manager is restarted

I am using Flink cluster with a job manager pod and two task manager pods in a kubernetes cluster. When I submit the streaming job to the job manager it runs the job and I receive the output into the sink. Also I have enabled checkpointing to…

apache-flink flink-streaming checkpointing

asked Oct 01 '20 at 12:28

user3553913

373
3
17

Questions tagged [checkpointing]