I am using cr-defunct checkpoint restore (based on feedback from Ross Boucher) to build 1.10.0-dev from source to get checkpoint/restore functionality.
When I checkpoint a container without any active TCP connections, and then restore it into a newly created one, I have no problems. However, if there is an active TCP connection, the restore fails. It is possible that the failure is because of other reasons... I am not sure. But the TCP failure pops out in the restore.log. Here is how I cause this to happen
Start a docker container (I use alpine-sshd) as the base image
docker run -d --security-opt seccomp:unconfined --name a1 alpine-sshd
Then, I ssh into the container. I have already setup the user
ssh abc@172.17.0.2
So, now there is an active TCP connection on port 22 for that container, which I can verify by entering the container and performing a "netstat -na" inside the container
Now, I create a new container (not start it) using
docker create --security-opt seccomp:unconfined --name=a3 alpine-sshd
"docker ps -a" reveals two containers, a1 and a3
Next, I checkpoint the a1 container using the checkpoint option. The --leave-running flag has no impact since it is not used in the restore, where the actual error lies
docker checkpoint --image-dir=/tmp/ABC a1
Then I restore using /tmp/ABC
docker restore --force=true --image-dir=/tmp/ABC a3
This causes the following error
Error response from daemon: Cannot restore container a3: cantstart: Cannot start container c40adc.....<snip ID>...: criu failed: type NOTIFY error 0
log file: /var/lib/docker/0.0/containers/c40adc...<snip ID>../criu.work/restore.log
The restore.log has the following notable errors:
14: Restoring TCP connection
14: Restoring TCP connection id 13 ino 153c9
14: Setting 1 queue seq to 2533629009
14: Setting 2 queue seq to 1507997351
14: Error (sk-inet.c:721): Can't bind inet socket (id 19): Cannot assign requested address
10: Error (cr-restore.c:1350): 14 exited, status=1
At the bottom of the log file
10: Restored
Error (cr-restore.c:1352): 20710 killed by signal 9
Error (cr-restore.c:2182): Restore failed
Now, I don't need the networking necessarily to be restored - although it would be useful to have. Right now, I just want a stable restore on a previously checkpointed image that had active networking connections.
NOTE that if I do this entire sequence without the ssh/TCP connection, it works nicely.
Any help will be greatly appreciated. I can provide full restore.log and other files, if needed. Thanks in advance