0

I have an API that's currently served by a subdomain under our main CA e.g. api.domain.com. This API is currently served by an Nginx ingress controller and the plan is to replace that with another ingress controller (namely Ambassador) and as much as possible avoid a deployment with downtime. Since we're using AWS EKS I have the following configured:

  • an ALB configured pointing to the Nginx installation
  • a separate target group pointing to the Ambassador installation
  • the ALB listeners configured with weighted routing to round robin between the installations (the idea is to increase/decrease traffic at will and supervise how the new Ingress Controller is managing it)
  • a Route53 record set alias pointing to the Load Balancer (api.domain.com)
  • ingress exposed via Nginx for the same as the record above (api.domain.com)

Since the ingress is handled by Kubernetes internal DNS what I was hoping was for the LoadBalancer to direct traffic seamlessly to both target groups (Nginx and Ambassador), but what I get is it's directing traffic only to the Ambassador target group, while the one defined also as an Nginx Ingress rule currently it just gives 503 Service Unavailable - note that the Nginx installation is reacheble through other dns mappings so everything works ok.

Any idea what I'm doing wrong? The whole idea was to do weighted routing at LB level and not DNS level to avoid DNS propagation issues.

Bogdan Emil Mariesan
  • 5,529
  • 2
  • 33
  • 57
  • Hi, look at the target groups. Do any of them have unavailability. It should be working how you say so presume its an issue between ALB connecting to TG. – Chris Williams May 18 '20 at 06:56
  • both target groups have fully healthy targets, if I remove any of two it works just fine – Bogdan Emil Mariesan May 18 '20 at 07:03
  • @mokugo-devops actually I was mildly wrong, if I disable the Ambassador record group, with the nginx Ingress still configured it doesn't work until I disable the Route 53 record. we have the main ca *.domain.com and the other registered under that... – Bogdan Emil Mariesan May 18 '20 at 07:29
  • @mokugo-devops found the issue, it was a rather stupid thing on my side, In the listener section of my ALB config I forgot to add the rule to serve the new api.domain.com route 53 entry, and the existing listeners were not covering that – Bogdan Emil Mariesan May 18 '20 at 08:13
  • Ah, well glad its sorted for you :) – Chris Williams May 18 '20 at 08:14
  • @mokugo-devops also thanks for confirming that i was not crazy :) I'm rather new to Kubernetes but I hope that I've got at least the basics right. I've detailed in an answer the steps required for someone experiencing similar issues, it's not that hard to have multiple Ingress controllers running in parallel once you've done it once :) – Bogdan Emil Mariesan May 18 '20 at 08:26

1 Answers1

1

As written in the comment to my own issue, in order for this to work and to have multiple Ingress controllers exposed under the same AWS ALB you have to validate the following checklist:

Assumption is that you are already using Nginx or another default controller that has ingress exposed for api.sub-domain.domain.com listed under a wildcard certificate such as *.sub-domain.domain.com

  1. Add a Route53 Alias record for the desired domain e.g. api.sub-domain.domain.com
  2. Add a target group pointing to the port & instances/instance groups of your new Ingress Controller
  3. In the ALB add the target group to the existing listener rules with the desired weight for traffic routing
  4. (Optional) You might need to define a new rule with Host header specified the same as for the alias record
  5. Update the listener rules
  6. Refresh the api.sub-domain.domain.com page and check the Networking section of your favorite browser for server type. In my case it was switching between Envoy(underlying proxy used by Ambassador) and Nginx (or you might see something related to Php 7.*)

Errors to be aware of:

  • 503 Service Unavailable - might indicate that you don't have a listener rule configured or it's miss-configured. Double check the response headers for any header mentioning ELB if that's the case it's clearly a configuration issue
  • 504 Gateway Timeout - your target groups are not configured correctly and the ports that you've configured are not targeting your Ingress Controller
  • dns_probe_finished_nxdomain - your Route53 record is not properly configured or not defined at all, make sure you have a CNAME and A type records configured for your domain and required subdomains.
Bogdan Emil Mariesan
  • 5,529
  • 2
  • 33
  • 57