7

I installed aws-load-balancer-controller on new EKS cluster (version v1.21.5-eks-bc4871b).

I installed by this guide https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.3/deploy/installation/ step by step but when I'm trying to deploy ingress object I'm getting the error I mentioned in the title. I tried to do as github issues questions like here https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2039 but didn't find any answer.

What else can I do for checking this?

yershalom
  • 786
  • 1
  • 8
  • 19

5 Answers5

20

In case it might help others - I also had the original issue using fargate profile and worker-node for core-dns. The solution for me I found in another place was just adding

node_security_group_additional_rules = {
  ingress_allow_access_from_control_plane = {
    type                          = "ingress"
    protocol                      = "tcp"
    from_port                     = 9443
    to_port                       = 9443
    source_cluster_security_group = true
    description                   = "Allow access from control plane to webhook port of AWS load balancer controller"
  }
}
ipeacocks
  • 2,187
  • 3
  • 32
  • 47
Emo
  • 216
  • 2
  • 3
5

I assume you are getting an error message like the following, if is not the case, please post your error.

Error from server (InternalError): error when creating "anymanifest.yaml": Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s": context deadline exceeded

Usually, it happens due to EKS control plane can't communicate with nodes using the webhook port.

Checkout the logs of the aws-load-balancer-controller pods to check the port it started to listen

{"level":"info","ts":1643365219.2410042,"logger":"controller-runtime.webhook","msg":"serving webhook server","host":"","port":9443}

In order to fix that, in the security group of the worker nodes, allow communications of port 9443 from EKS control plane

RuBiCK
  • 805
  • 4
  • 12
  • 23
2

In my case, I've analysed the same issue this way:

  1. I've decribed the aws-load-balancer-webhook-service k8s Service, and I saw it had no endpoints
  2. I looked to the aws-load-balancer-controller k8s Deployment, it was stuck to 0/0 replicas :/
  3. I've described the aws-load-balancer-controller Replicaset, the following error was raised by the replicaset-controller:
Error creating: pods "aws-load-balancer-controller-XXX-" is forbidden: error looking up service account kube-system/aws-load-balancer-controller: service account "aws-load-balancer-controller" not found
  1. So I checked the service account creation step, and I saw that the CloudFormation process (done by eksctl create iamserviceaccount --name=aws-load-balancer-controller ...) had failed
    (Stack name: eksctl-<CLUSTER_NAME>-addon-iamserviceaccount-kube-system-aws-load-balancer-controller)
  2. It was because the add-on repository was wrong (bad account number)
    Choose the right one here: https://docs.aws.amazon.com/eks/latest/userguide/add-ons-images.html

Then, to apply the fix, I've:

  1. Deleted the eksctl-<CLUSTER_NAME>-addon-iamserviceaccount-kube-system-aws-load-balancer-controller CloudFormation stack
  2. Launched the eksctl create iamserviceaccount --name=aws-load-balancer-controller command again
  3. Scaled the aws-load-balancer-controller Replicaset from 0 to 2

And it worked ;)

  • Thank you so much! In my case, the cause of the issue was the re-use of an existing role name in the `eksctl create iamserviceaccount` command. – Dominique PERETTI Sep 29 '22 at 14:46
0

load-balancer-controller pod description for more details It may happen that the image is not available at the ECR

  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 27 '22 at 13:09
0

This is a follow up to the accepted answer. If you are not using fargate or are confused by the answer itself, the original source refers to a Terraform script

To apply this solution from AWS Console:

  1. Locate the cluster's security group. take note of the ID

Security group of nodes and cluster

  1. Select the NODE's security group and edit the inbound connections.
  2. Add a new rule: Custom TCP 9443, put as source the cluster security group ID

The inbound rules will look like this as a result

Tony B
  • 41
  • 3