failed calling webhook "vingress.elbv2.k8s.aws"

Question

I installed aws-load-balancer-controller on new EKS cluster (version v1.21.5-eks-bc4871b).

I installed by this guide https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.3/deploy/installation/ step by step but when I'm trying to deploy ingress object I'm getting the error I mentioned in the title. I tried to do as github issues questions like here https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2039 but didn't find any answer.

What else can I do for checking this?

score 20 · Accepted Answer · edited May 16 '23 at 12:35

20

In case it might help others - I also had the original issue using fargate profile and worker-node for core-dns. The solution for me I found in another place was just adding

node_security_group_additional_rules = {
  ingress_allow_access_from_control_plane = {
    type                          = "ingress"
    protocol                      = "tcp"
    from_port                     = 9443
    to_port                       = 9443
    source_cluster_security_group = true
    description                   = "Allow access from control plane to webhook port of AWS load balancer controller"
  }
}

edited May 16 '23 at 12:35

ipeacocks

2,187
3
32
47

answered Apr 14 '22 at 10:34

Emo

216
2
3

It helped me a lot, was stuck far too much time on this, thanks @Emo! – Veve Oct 07 '22 at 15:58
This still works today. I was getting webhook errors and this fixed those errors. – Brady May 04 '23 at 19:47

score 5 · Answer 2 · answered Jan 28 '22 at 17:51

I assume you are getting an error message like the following, if is not the case, please post your error.

Error from server (InternalError): error when creating "anymanifest.yaml": Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s": context deadline exceeded

Usually, it happens due to EKS control plane can't communicate with nodes using the webhook port.

Checkout the logs of the aws-load-balancer-controller pods to check the port it started to listen

{"level":"info","ts":1643365219.2410042,"logger":"controller-runtime.webhook","msg":"serving webhook server","host":"","port":9443}

In order to fix that, in the security group of the worker nodes, allow communications of port 9443 from EKS control plane

score 2 · Answer 3 · answered Sep 16 '22 at 12:35

In my case, I've analysed the same issue this way:

I've decribed the aws-load-balancer-webhook-service k8s Service, and I saw it had no endpoints
I looked to the aws-load-balancer-controller k8s Deployment, it was stuck to 0/0 replicas :/
I've described the aws-load-balancer-controller Replicaset, the following error was raised by the replicaset-controller:

Error creating: pods "aws-load-balancer-controller-XXX-" is forbidden: error looking up service account kube-system/aws-load-balancer-controller: service account "aws-load-balancer-controller" not found

So I checked the service account creation step, and I saw that the CloudFormation process (done by eksctl create iamserviceaccount --name=aws-load-balancer-controller ...) had failed
(Stack name: eksctl-<CLUSTER_NAME>-addon-iamserviceaccount-kube-system-aws-load-balancer-controller)
It was because the add-on repository was wrong (bad account number)
Choose the right one here: https://docs.aws.amazon.com/eks/latest/userguide/add-ons-images.html

Then, to apply the fix, I've:

Deleted the eksctl-<CLUSTER_NAME>-addon-iamserviceaccount-kube-system-aws-load-balancer-controller CloudFormation stack
Launched the eksctl create iamserviceaccount --name=aws-load-balancer-controller command again
Scaled the aws-load-balancer-controller Replicaset from 0 to 2

And it worked ;)

Thank you so much! In my case, the cause of the issue was the re-use of an existing role name in the `eksctl create iamserviceaccount` command. — Dominique PERETTI, Sep 29 '22 at 14:46

score 0 · Answer 4 · answered Mar 27 '22 at 10:00

0

load-balancer-controller pod description for more details It may happen that the image is not available at the ECR

answered Mar 27 '22 at 10:00

Sanjay Mahajan

1

1

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 27 '22 at 13:09

score 0 · Answer 5 · answered Aug 28 '22 at 19:01

This is a follow up to the accepted answer. If you are not using fargate or are confused by the answer itself, the original source refers to a Terraform script

To apply this solution from AWS Console:

Locate the cluster's security group. take note of the ID

Select the NODE's security group and edit the inbound connections.
Add a new rule: Custom TCP 9443, put as source the cluster security group ID

failed calling webhook "vingress.elbv2.k8s.aws"

5 Answers5