Two VMs are hosted on the same machine, let's name them VM1 and VM2. Both VMs are hosting a runC
container running a server application, a client is running directly on the host machine and is connected to service running in the container in VM1.
I want to build a failover service where if I close the container on VM1, the client should get connected to the container running on VM2.
I've implemented and tested a floating IP using keepalived and VRRP as suggested in this link, but this works only when the whole VM fails, I have to implement fault tolerance service for a container if the container on VM1 fails, the client should get connected to a replica running on VM2 even if the VM1 is up i.e the container fails but the VM hosting that container is up.
The restriction here is I do not want to use any Load Balancer/HAproxy service which can become a single point of failure.
Any idea how can i achieve this? or is it not possible to do this?
EDIT: (As suggested in comments, adding detail to eliminate possible confusions.) container on VM2 has not been running from the time when the primary container i.e container running on VM1 was started, There is a container checkpoint/restore utility called as criu
, you can look at it here and here. So using this utility, VM1's container is migrated to VM2 by first checkpointing that container and then migrating that state to VM2 and restoring it. But now it is also needed to ensure that the client should get connected to the container running on VM2. Till now i was just running container in a network namespace and adding a route from VM1's interface. with this i can prevent hardware failure like if the VM1 itself fails as mentioned earlier using keepalived and
VRRP, but now what I want is when the container's state is migrated to VM2's side and the container goes down but VM1 is up and the migrated state of that container is restored on VM2, how do I make sure that the client will get connected to the now running container on VM2? I've to simulate this scenario.