We have recently set up a AKS cluster with a NGINX controller.
Access seemed ok, but then we found that occasional requests are unable to connect.
To demonstrate the problem we use a short powershell script which makes repeated requests to the URL, writes out the response's status code, waits 0.5 seconds, then repeats.
$url = "https://staging.[...].[...]"
$i = 0
while($true)
{
$statusCode = "ERROR"
try{
$statusCode = (invoke-webrequest $url).statuscode
}
catch{
$statusCode = $_
}
$i = $i + 1
write-host "$i Checking $url --> $statusCode"
start-sleep -seconds 0.5
}
When I run this script it can run for about 200 requests, each time returning a 200 (OK)
response, then it will pause for about 30 seconds (which I assume to be the timeout period of the Invoke-WebRequest
method) then write "Unable to connect to the remote server".
To debug the problem we enabled port-forwarding to bypass the load balancer, thus addressing the pod directly (with the host header being added): No problem; the powershell script consistently shows 200
responses over at least 10 minutes.
We have also enabled port-forwarding to the NGINX controller and repeated the test: Again, consistent 200
responses.
But without port-forwarding enabled, requests to the URL - now going through the load balancer - show intermittent connection problems.
Strangely, when I run the script these connection problems happen every 200 or 201 requests, yet when a colleague ran the same script he got no response for every 2nd or 3rd request. I can repeat this and continue to see these connection problems at consistent intervals.
UPDATE:
The load balancer looks like this...