2

I have a question about achieving high availability over an AWS VPN.

Context:

I have a requirement to establish a site to site VPN connection between my AWS VPC and a big corporate. This VPN link is required to support inbound http connections to an application in my AWS VPC. For various reasons, the big corporate I am connecting to will not allocate me a non-conflicting private (RFC1918) subnet to use. Instead, they require me to use NAT to expose my services over the VPN on a public IP of my choice (one that I will reserve). I believe that I have managed to successfully set this up (subject to testing) with the right combination of routing rules by following this guide (without a separate proxy), however I can only direct incoming connections to a single IP.

Question:

I am wondering if there is a way that I can direct connections to a highly available load balancer instead? This would allow better scalability and also high availability. Things I have considered:

  • Using an AWS external ELB:
    • These do not have reliable IP addresses or a narrow enough range. I would not be able to add such a range on the right-side routing rules as such a range could conflict with any other service hosted on AWS.
  • Using an AWS internal ELB:
    • These have a public DNS record and a predictable range, however these are in the private IP ranges and so cannot be used by the client system as they are only allowed to create static routes to non RFC1918 public IPs.
  • Implementing my own load balancer such as HAproxy:
    • This would address the scalability issue, but would still leave a single point of failure in the system (the HA node itself).
    • Additionally this is one more machine I have to maintain.

Does anyone know if there is a way to reference an ELB in the VPC routing tables? Or have any other suggestions on how to achieve this?

Thanks

kabadisha
  • 163
  • 7
  • What type of VPN are you using ? – user9517 Sep 09 '15 at 08:42
  • The client is using a Cisco ASA 5515. I am trying to use the AWS VPN endpoint. – kabadisha Sep 09 '15 at 09:58
  • How can you use the Amazon Virtual Private Gateway if you have RFC 1918 conflict on the corporate side? You're NAT'ig on the Cisco ASA? If you're using the Amazon Virtual Private Gateway, you should have access to redundant VTI interfaces on the Amazon side, each in different AZ, which should provide sufficient HA. Sorry, not really clear on what your issue is. – Garreth McDaid Sep 09 '15 at 10:53
  • No, NATing on my side. The client rends a request to a public IP I provide, the client's network routes this request over the VPN, then I NAT this request to an internal IP on my private subnet :-) The Amazon VPG would give me highly available VPN, but my challenge is providing a highly available route to the web service for requests that arrived over that VPN. The web service is available on a cluster of several machines on my end of the link and I would like to load balance requests across this cluster. Unfortunately I can't find a way to route requests arriving over the VPN to an ELB. – kabadisha Sep 09 '15 at 12:09
  • 1
    Still not exactly clear on what you are trying to do. An internal Amazon ELB has a hostname. Create a CNAME record for that hostname, which will direct requests to an RFC1918 ip address. Your network configuration should correctly route to that ip address. If your ELB has multiple subnets in multiple AZs attached, and your are using the Amazon VPG for your VPN, you don't have any HA concerns. – Garreth McDaid Sep 09 '15 at 16:21
  • Understood, however I can't use a CNAME record, precisely because it will return an RFC1918 ip address. The client application (located in the corporate network) would resolve such a DNS record to the correct RFC1918 ip address (the private ip of the target machine within the AWS network). Unfortunately, the big corporate will not add the static route required on their end to route the request over the VPN because that would require them to reserve that RFC1918 ip address for the service I am hosting. They will only set up such a static route on their end for a non RFC1918 ip address. – kabadisha Sep 09 '15 at 18:31
  • @GarethMcDaid one problem is that the IP address of the ELB is not static. I have encountered this same restriction when interconnecting to other corporate networks. It seems like the wrong way to solve the problem, assigning a public IP "inside" a private network and essentially wasting it, treating it as a private IP, but that was their policy. I assigned that public IP to a loopback adapter on the tunnel machine, and configured a local HAProxy to bind to it. Since it's only handling inbound traffic from the tunnel, it works. – Michael - sqlbot Sep 09 '15 at 22:47

1 Answers1

2

The machine terminating the tunnel is a single point of failure already, isn't it? If so, running HAProxy right there seems like the thing to do (and I'm not just saying so because that's the way I do it, even though it is).

I can count my production outages caused by haproxy on one hand without using any fingers (or thumb). Asynchronous DNS in version 1.6 (still in development as of this writing) would let you use an internal ELB as a back-end to haproxy, allowing you to pretty much set and forget and use the existing ELB/EC2 integrations for your actual capacity scaling.

C3, C4, M3, R3, and T2 instance types also support the relatively new instance recovery feature, which stops, recreates, and restarts your instance on different hardware but with the same instance id, elastic IP, and EBS volumes if it stops responding favorably to instance health checks.

Michael - sqlbot
  • 22,658
  • 2
  • 63
  • 86
  • The Virtual Private Gateway offered by Amazon is considered a highly available system - Amazon manage and deal with any failures. I agree that failure of HA is unlikely, but just trying to do due diligence :-) – kabadisha Sep 09 '15 at 12:11
  • Instance recovery looks like an interesting fall-back position though, thanks :-) – kabadisha Sep 09 '15 at 12:12
  • @kabadisha I think I read something into your question that was not really there. I thought you were using OpenSwan to terminate the VPN, but you are using the AWS VPC native VPN service? I don't believe that will work at all. Traffic from the VPC VPN gateway is not subject to any VPC route table. It knows how to reach instances on their assigned addresses within the VPC supernet, and nothing more. It is intended for connections to networks that are fully trusted. I may want to write an entirely different answer, here. – Michael - sqlbot Sep 09 '15 at 22:57
  • Yes indeed - I am using the AWS native VPN service. I would be very surprised if traffic from the VPN gateway was not subject to the VPC routing rules. Since the VPG is bound to the VPC, shouldn't traffic from it be subject to the same routing rules? – kabadisha Sep 11 '15 at 14:07
  • @kabadisha, but *which* routing rules would those be? VPC route tables are applied to subnets. The "main" route table applies to subnets without an assigned route table. No route table applies to VPN connections, because the only routing they are aware of is the built-in/implicit route to each subnet in the VPC. – Michael - sqlbot Sep 11 '15 at 17:35