I've got an ACS Windows cluster setup using k8s that is generally running well. I've deployed ASP.NET Core webapi and worker app containers. These two containers work fine locally and generally in ACS as well. I can scale them out and back, deploy new versions, etc.
They are functional and working and yet then suddenly start generating DNS resolution errors when trying to access external internet resources. I'm seeing exceptions that include:
System.Net.Http.WinHttpException: The server name or address could not be resolved
The resources they are trying to access resolve fine and then suddenly stop resolving. Then after some indeterminate time (a few minutes, 20 minutes, or even a few hours it seems) they start resolving again, clear quite intermittent. Note that these external resources are CosmosDB, Azure Queues and a 3rd party logging service called Loggly (point being they are all big web properties and are not at fault here). Also note the two containers do not necessarily lose DNS at the same time.
I've tried opening a command shell inside the container:
kubectl exec -it {podname} -- powershell
And then using powershell to request a site:
invoke-webrequest -uri www.google.com -outfile test.txt
get-content test.txt
...and it works fine, I can access google.com. So I have no idea how to debug this. Are there known issues with k8s on ACS that might be in play here?
I've deployed the same containers to a simple Server 2016 host and do not see the problem at all. So it seems to revolve around either k8s or the ACS cluster itself. I've rebuilt the ACS cluster 4 or 5 times in different regions (which use different k8s versions) and see exactly the same problem.
This is a major blocker for me. External internet access is obviously very basic and core functionality. My webapi and worker app are completely broken without it.