Questions tagged [blcr]

Berkeley Lab Checkpoint/Restart is a hybrid kernel/user implementation of checkpoint/restart.

Berkeley Lab Checkpoint/Restart is a hybrid kernel/user implementation of checkpoint/restart. See http://crd.lbl.gov/groups-depts/ftg/projects/current-projects/BLCR for detail.

10 questions
3
votes
0 answers

RPi BLCR/MPICH Checkpoint/Restart issue

After have been investigating my problem for weeks I have found some information from the hexdump of the context(I got one without C/R error (links at the end of this question, but no restart success)) (context-num0-0-0,…
x4k3p
  • 1,598
  • 2
  • 22
  • 42
2
votes
2 answers

mpiexec checkpointing error (RPi)

When I try to run an application (just a simple hello_world.c doesn't work) I receive this error every time: mpiexec -ckpointlib blcr -ckpoint-prefix /tmp/ -ckpoint-interval 10 -machinefile /tmp/machinefile -n 1 ./app_name [proxy:0:0@masterpi]…
x4k3p
  • 1,598
  • 2
  • 22
  • 42
1
vote
1 answer

Checkpointing and restarting X11 applications

I want to checkpoint and restart X11 applications. I am using the BLCR (Berkeley Lab Checkpoint/Restart (BLCR)) tool. BLCR is not able (without modifications) to reinitiate the connection to the X-Server. I used an interposition library to log all…
OldMacDonald
  • 77
  • 2
  • 7
1
vote
1 answer

OpenMPI on Raspberry Pi with Checkpoint/Restart support

I have a simple question. Does OpenMPI on Raspberry PI i.e. ARM provide the Checkpoint/Restart feature? I have MPICH with BLCR but I can't restart any application. (MPICH and BLCR are built myself) So I would try out with OpenMPI. (yes I mean…
x4k3p
  • 1,598
  • 2
  • 22
  • 42
1
vote
0 answers

Restart a mpi slave after checkpoint before failure on ARMv6

UPDATE I have an university project in which I should build up a cluster with RPis. Now we have a fully functional system with BLCR/MPICH on. BLCR works very well with normal processes linked with the lib. Demonstrations we have to show from our…
x4k3p
  • 1,598
  • 2
  • 22
  • 42
1
vote
1 answer

Torque BLCR checkpoint with static linked executable

I am trying to checkpoint jobs being handled by the torque job scheduler using the Berkeley Lab checkpointing (BLCR) scheme and I am having errors thrown when attempting cr_run 'my_exec' because I believe that the executable was statically linked at…
codeAndStuff
  • 507
  • 6
  • 19
1
vote
0 answers

Questions about torque checkpoint MPI jobs with BLCR

We're trying to use torque to checkpoint MPI jobs, but it seems that torque can only handle jobs running on a single node. I checked the code and found that when using qhold to checkpoint a job, qhold sends a PBS_BATCH_HoldJob request to pbs server,…
levin li
  • 391
  • 3
  • 10
0
votes
0 answers

BLCR install in Centos

I'm working for make checkpoint for my program, and I find that BLCR might can help me. I want to install it on my server. I download it with version 0.8.5, However, it raises Configure error: Unable to use Kernal 3.10.0-1160.el7.x86_64. How can I…
Error
  • 1
  • 1
0
votes
0 answers

Use linked file as input to linker

Does linux have a way to use a statically linked file as input. Specifically, I'd like to add the BLCR libcr library to a statically linked program for which I do not have access to the source code.
GeneF
  • 51
  • 1
  • 2
0
votes
1 answer

How to destroy(clean, reset) CUDA application completely during process running

I now plan to make Checkpoint/Restart library for CUDA application with BLCR. To do this, I have to destroy the CUDA application completely during process running. Because, BLCR be failed to run cr_checkpoint if process remains on GPU. Actually, I…
user2779344
  • 220
  • 1
  • 10