12

After upgrading our AKS kubernetes cluster to from v1.23.8 to v1.24.3 our ingress stopped working properly. No errors logged in events and the ingress-nginx pod does not report any errors on the console. Everything looks fine from within the cluster, but all ports for the public IP is closed externally.

Even curl'ing the web-apps that run in the cluster from within the cluster works fine. It seems like it's just the opening of the ports externally that's broken. Ingress-nginx is deployed via helm release (HR v4.2.5).

I have a feeling it must be some config for the ingress or controller that needs to be changed.

UPDATE: we did a new install of a plain AKS cluster and did helm install quickstart ingress-nginx/ingress-nginx in 1.23.8 (which works), 1.24.0 (which does not work) and in 1.24.3 (which does not work either).

Any ideas or pointers?

sevenam
  • 411
  • 1
  • 11

1 Answers1

19

We found the issue.

For clusters v1.24.0 and up the health probes for the load balancer is set to HTTP and HTTPS instead of TCP. When we changed the health probes to use TCP it all worked again.

Created an issue for AKS on this: https://github.com/Azure/AKS/issues/3210

The proper fix was to add the following annotation to the nginx service (see link to AKS issue above):

values:
controller:
  service: 
    annotations:
      service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz
sevenam
  • 411
  • 1
  • 11
  • Thank you for posting the fix once you'd discovered it! I've just hit the same thing, and this saved my Sunday! It's pretty rubbish that this wasn't caught by Microsoft's QA – Dan Nov 06 '22 at 07:52
  • 1
    Thanks a lot for your question and answer! Saved me a lot of time. The problem is still there (at least Kubernetes v1.24.6 of AKS is affected too) – Sergej Masljukow Jan 24 '23 at 10:17
  • This totally just saved my bacon after migrating to 1.24.6 – Ian1971 Mar 23 '23 at 22:10
  • Microsoft really need to put this on the portal's AKS upgrade page, I had to google to find out what had happened. They do mention the extra annotation at https://learn.microsoft.com/en-us/azure/aks/ingress-basic?tabs=azure-cli#basic-configuration but that doesn't help if you're upgrading. – PaulD Mar 27 '23 at 23:49
  • Just made this change and it sorted my issue. Been battling it for a few weeks now after my upgrade! – John Fox Jun 01 '23 at 11:44