I'm running into an issue:
Getting a health check to succeed for a .Net app running in an IIS Container when trying to use Container Native Load Balancing(CNLB).
I have a Network Endpoint Group(NEG) created by an Ingress resource definition in GKE with a VPC Native Cluster.
When I circumvent CNLB by either exposing the NodePort or making a service of type LoadBalancer, the site resolves without issue.
All the pod conditions from a describe look good: pod readiness
The network endpoints show up when running describe endpoints
: ready addresses
This is the health check that is generated by the load balancer: GCP Health Check
When hitting these endpoints from other containers or VMs in the same VPC, /health.htm responds with a 200. Here's from a container in the same namespace, though I have reproduced this with a Linux VM, not in the cluster but in the same VPC: endpoint responds
But in spite of it all, the health check is reporting the pods in my NEG unhealthy: Unhealthy Endpoints
The stackdriver logs confirm the requests are timing out but I'm not sure why when the endpoints are responding to other instances but not the LB: Stackdriver Health Check Log
And I confirmed that GKE created what looks like the correct firewall rule that should allow traffic from the LB to the pods: firewall
Here is the YAML I'm working with:
Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: subdomain.domain.tld
name: subdomain-domain-tld
namespace: subdomain-domain-tld
spec:
replicas: 3
selector:
matchLabels:
app: subdomain.domain.tld
template:
metadata:
labels:
app: subdomain.domain.tld
spec:
containers:
- image: gcr.io/ourrepo/ourimage
name: subdomain-domain-tld
ports:
- containerPort: 80
readinessProbe:
httpGet:
path: /health.htm
port: 80
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 10
volumeMounts:
- mountPath: C:\some-secrets
name: some-secrets
nodeSelector:
kubernetes.io/os: windows
volumes:
- name: some-secrets
secret:
secretName: some-secrets
Service:
apiVersion: v1
kind: Service
metadata:
labels:
app: subdomain.domain.tld
name: subdomain-domain-tld-service
namespace: subdomain-domain-tld
spec:
ports:
- port: 80
targetPort: 80
selector:
app: subdomain.domain.tld
type: NodePort
Ingress is extremely basic as we have no real need for multiple routes on this site, however, I'm suspecting whatever issues we're having are here.
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: gce
labels:
app: subdomain.domain.tld
name: subdomain-domain-tld-ingress
namespace: subdomain-domain-tld
spec:
backend:
serviceName: subdomain-domain-tld-service
servicePort: 80
Last somewhat relevant detail is I tried the steps present in this documentation and it worked but it's not identical to my situation as its not using Windows Containers nor Readiness Probes: https://cloud.google.com/kubernetes-engine/docs/how-to/container-native-load-balancing#using-pod-readiness-feedback
Any suggestions would be greatly appreciated. I've spent two days stuck on this and I'm sure it's obvious but I just can't see the problem.