when using microservices and microservice A wants to talk to microservice B there is some kind of loadbalancing as we can have multiple instances of B. it can be either infrastructure LB (kubernetes) or client side LB (eureka + ribbon). it's pretty simple when everything is deployed in a single region and AZ.
what happens when we want to achieve multi zone HA and use closest region to achieve low latency?
user request should routed to the closes region by the cloud provider? A should call B only withing the same AZ? should all AZ be completely isolated and users should be switched between them? if all services B in AZ X are dead, the whole AZ X should be killed or traffic from service A in AZ X should be directed to AZ Y? in the second case does cloud providers offers such functionality?
or maybe A should see all the B in all AZs and it should call any of them? in this case what about latency when the request goes to B located far away?
what are the patterns to handle such scenarios?