microservices with multiregion HA

Question

when using microservices and microservice A wants to talk to microservice B there is some kind of loadbalancing as we can have multiple instances of B. it can be either infrastructure LB (kubernetes) or client side LB (eureka + ribbon). it's pretty simple when everything is deployed in a single region and AZ.

what happens when we want to achieve multi zone HA and use closest region to achieve low latency?

user request should routed to the closes region by the cloud provider? A should call B only withing the same AZ? should all AZ be completely isolated and users should be switched between them? if all services B in AZ X are dead, the whole AZ X should be killed or traffic from service A in AZ X should be directed to AZ Y? in the second case does cloud providers offers such functionality?

or maybe A should see all the B in all AZs and it should call any of them? in this case what about latency when the request goes to B located far away?

what are the patterns to handle such scenarios?

Have you looked at DNS load balancing? Some of the DNS services like Route53 provides support for latency-based routing, where you can define different endpoints dedicated for regions. So the flow would be DNS (Latency Based) -> Hits the Load Balancer within a Region -> Internal Load Balancer (If Available in a Container Cluster) -> Container. Is this what you are looking for? — Ashan, Jun 13 '19 at 00:58

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

High availability within region

You are right that Region can have multiple availability zones (AZ) and if you use cloud provider (AWS/Azure) they would provision the instances in different availability zones.

You have to keep in mind that traffic between AZ is not travelled on public internet but cloud provider has their own dedicated network. Cloud providers (AWS/Azure) generally provide <2 ms latency for the request travelling between AZ which is very good considering we have high availibilty. If you deploy kubernetes cluster the nodes are distributed across different AZs . If any of the AZ or node is down the traffic is moved to different AZ making it highly available.

High availability across region

Managing availability across regions can be a challenge. Generally you would replicate the same compute infrastructure (Another Kubernetes cluster) in different region and would use cloud provider DNS to land the user in the nearest region. It's easier for stateless applications. However things get tricky when it comes to cost and database.

Database is very important consideration. If you are using RDBMS (strong consistency) then you could have only one master node across all regions but multiple read replicas across different regions. So write operation regardless of which region it happens could only go to specific write node and may involve latency.

If you are using NoSQL for eventual consistency then you have to compromise on strong consistency. Although some cloud provider strong consistency across regions but you have to mind the delay.

Cost is another factor. Having a duplicated infrastructure across region would cost you double if it's active-active region. You could go active-passive where launch the services if region is down or active-pilot light where you just keep the bare minimum and scale while traffic being diverted.

Having Redundant infra across regions really depends of your business application and primary needs.

what do you mean by 'If any [...] node is down the services are moved to different AZ'? If i have 20 instances of service A (to handle high traffic) and one of them is down, new AZ will be used? instead it should try to launch another instance in the same AZ, right? — piotrek, Jun 13 '19 at 01:15
I have updated answer repalcing "services" with "traffic". If Node is down then you are right it will relaunch in the same AZ. If AZ is down then your node will be relaunched in other AZ. Depending on which services you are using your load will be divided into the nodes that are healthy. If you are using Kubernetes regardless of AZ is down or Node is down your kubernetes will take your containers away from failed node and try to provision in other healthy nodes until the AZ or failed node is available again. — Imran Arshad, Jun 13 '19 at 01:28

microservices with multiregion HA

1 Answers1