0

Two VMs are hosted on the same machine, let's name them VM1 and VM2. Both VMs are hosting a runC container running a server application, a client is running directly on the host machine and is connected to service running in the container in VM1.

I want to build a failover service where if I close the container on VM1, the client should get connected to the container running on VM2.

I've implemented and tested a floating IP using keepalived and VRRP as suggested in this link, but this works only when the whole VM fails, I have to implement fault tolerance service for a container if the container on VM1 fails, the client should get connected to a replica running on VM2 even if the VM1 is up i.e the container fails but the VM hosting that container is up.

The restriction here is I do not want to use any Load Balancer/HAproxy service which can become a single point of failure.

Any idea how can i achieve this? or is it not possible to do this?

EDIT: (As suggested in comments, adding detail to eliminate possible confusions.) container on VM2 has not been running from the time when the primary container i.e container running on VM1 was started, There is a container checkpoint/restore utility called as criu, you can look at it here and here. So using this utility, VM1's container is migrated to VM2 by first checkpointing that container and then migrating that state to VM2 and restoring it. But now it is also needed to ensure that the client should get connected to the container running on VM2. Till now i was just running container in a network namespace and adding a route from VM1's interface. with this i can prevent hardware failure like if the VM1 itself fails as mentioned earlier using keepalived and VRRP, but now what I want is when the container's state is migrated to VM2's side and the container goes down but VM1 is up and the migrated state of that container is restored on VM2, how do I make sure that the client will get connected to the now running container on VM2? I've to simulate this scenario.

y_159
  • 121
  • 6
  • Why are you trying to reinvent the wheel instead of using something like Kubernetes? – Michael Hampton Jan 04 '21 at 18:18
  • @MichaelHampton It's part of a project where I have to use a simple single container, not Kubernetes. The replica container on VM2 is only started/restored using `criu` utility when the primary one goes down, so it's not like a cluster service. I've not mentioned many of the details of the project which can lead to unwanted discussion and create confusion among readers thereby keeping it simple and easy to understand. – y_159 Jan 04 '21 at 19:05
  • If you still want to know, Please look at `criu` https://github.com/checkpoint-restore/criu, https://criu.org/Main_Page a project which can checkpoint and restore containers. – y_159 Jan 04 '21 at 19:08
  • You should _include_ such details, so as to avoid confusion and unwanted discussion. – Michael Hampton Jan 04 '21 at 19:14

0 Answers0