I now plan to make Checkpoint/Restart library for CUDA application with BLCR.
To do this, I have to destroy the CUDA application completely during process running.
Because, BLCR be failed to run cr_checkpoint if process remains on GPU.
Actually, I tried to call cudaDeviceReset() at some point and after that call sleep(1000), during the sleep system call I sent the signal like this; cr_checkpoint PID
.
The case, I succeeded to create context.PID file but failed to run like this; cr_run context.PID
.
Error Message is as follows;
-mmap(0, 200000000, 2700000000, ...) = 0xfffffffffffffff4 (failed) -thaw_threads returned error, aborting. -12 Restart failed: Cannot allocate memory
Does anyone have any idea for this? Summary is as follows.
- I plan to make Checkpoint/Restart library for CUDA applications with BLCR.
- I tried to call cudaDeviceReset() function, but it failed to restart (succeeded to create context.PID file but failed to restart)
- I want to know how to destroy or reset CUDA applications completely during process running.
I would appliciate it if anyone gave any idea for me.