1

We have recently set up a AKS cluster with a NGINX controller.

Access seemed ok, but then we found that occasional requests are unable to connect.

To demonstrate the problem we use a short powershell script which makes repeated requests to the URL, writes out the response's status code, waits 0.5 seconds, then repeats.

$url = "https://staging.[...].[...]"
$i = 0

while($true)
{
    $statusCode = "ERROR"   

    try{
        $statusCode = (invoke-webrequest $url).statuscode
    }
    catch{
        $statusCode = $_
    }    

    $i = $i + 1
    write-host "$i Checking $url --> $statusCode"
    start-sleep -seconds 0.5
}

When I run this script it can run for about 200 requests, each time returning a 200 (OK) response, then it will pause for about 30 seconds (which I assume to be the timeout period of the Invoke-WebRequest method) then write "Unable to connect to the remote server".

enter image description here

To debug the problem we enabled port-forwarding to bypass the load balancer, thus addressing the pod directly (with the host header being added): No problem; the powershell script consistently shows 200 responses over at least 10 minutes.

We have also enabled port-forwarding to the NGINX controller and repeated the test: Again, consistent 200 responses.

But without port-forwarding enabled, requests to the URL - now going through the load balancer - show intermittent connection problems.

Strangely, when I run the script these connection problems happen every 200 or 201 requests, yet when a colleague ran the same script he got no response for every 2nd or 3rd request. I can repeat this and continue to see these connection problems at consistent intervals.

UPDATE:
The load balancer looks like this...

enter image description here

awj
  • 7,482
  • 10
  • 66
  • 120

1 Answers1

2

I can't explain why but we found out that if we changed the VMs in our node pool from burstable VMs to non-burstable (from 'B' class to 'D' class) then the problem went away.

awj
  • 7,482
  • 10
  • 66
  • 120