1

I've considered using Round-robin DNS failover, but due to various levels of DNS caching, you run the risk of old records pointing to a down site.

If I own my own address space and AS number, in data centre 1, I can announce my 1.1.1.x/x network to the world from my router and everyone can hit the load balancer at 1.1.1.1, life is peachy. I could then have in another data centre, the same set up, and the router there also announces my IP space and AS number to a different upstream provider there. Users will obviously hit the closest set-up to them based on the AS-PATH and other metrics from BGP in their local ISPs network.

Now, lets pretend something terrible happens at data centre 1, and router 1 goes offline, my IP space and AS are no longer announce there, and all traffic will fall back to solely data centre 2, for traffic to 1.1.1.1 (load balancer). Do people do this, is this a ridiculous idea? Have I missed something blindingly obvious about why I shouldn't do this? Is this just not practical, or a genius plan?

EEAA
  • 109,363
  • 18
  • 175
  • 245
jwbensley
  • 4,202
  • 11
  • 58
  • 90

2 Answers2

4

It works fine. It just requires a lot more engineering work to make it go, you need routers, BGP connectivity, your own IP space, etc. I assume in your example above that you have data center 1 and 2 running all the time.

Many people do this fine, look up 'anycast' for what you're trying to do. The large problem with this is that it works much better for UDP based services (non-stateful). If you're downloading a large file via HTTP, and there's an outage on the router side, your traffic will go to the new 1.1.1.1, which has no idea what is going on, and will drop the connection.

Aaron
  • 2,968
  • 1
  • 23
  • 36
  • 1
    Thanks for your answer. I have since searched for "anycast" on Google and ServerFault. So it turns out I just needed to learn that word, "anycast", know I have a mountain to read and research so thanks very much :D – jwbensley Jan 31 '12 at 09:56
  • Can this be used for DR-site reduancy? (take over for a site that is no longer responding) Or is it only useful for proximity route optimization? – Eric Falsken Mar 05 '13 at 20:00
  • @EricFalsken Generally speaking, both sites would get traffic at the same time, so it wouldn't be a true 'DR' thing. You would also need the necessary logic to figure out "when a site is not responding" to remove the route. For example, you would need your router to communicate somehow with your NMS to withdraw the route if, say, the web server stops responding on port 80. – Aaron Mar 05 '13 at 20:30
  • @Aaron, what if my router were to become unavailable? Would the traffic be auto-routed to the other site instead? I'm ok with both sites being "live" as long as users of one are automatically sent to the other when it stops responding. – Eric Falsken Mar 05 '13 at 22:01
  • @EricFalsken yes, typically if the router originating the route goes away, all the traffic for that route would go to the 'next best' location. – Aaron Mar 05 '13 at 23:53
1

One other option outside of BGP would be global load balancing. To avoid the DNS propigation, I had been looking at a SAAS solution called cloud leverage. They use IPv6 Multicase and a propriatery load balancer to make it work. We haven't gone with them yet, but that's simply due to priorities.

I know it doesn't answer you BGP question, but wanted throw it out just incase.

Eric C. Singer
  • 2,329
  • 16
  • 17
  • Interested idea Eric. I am think of using RR-DNS and anycast for each A record returned, with A CDN for static content so that should all do the trick. Curiously though, how does the IPv6 multicast scenario work for you? I'm intrigued. – jwbensley Jan 31 '12 at 09:58
  • Like i said, we're not using it yet, but you would have a single A record and it would point to thier global IP. Then you would have several options for a client to find the right server. – Eric C. Singer Jan 31 '12 at 16:16