9

it was a working set up and no manual changes were made.

when we are trying to deploy application on aks; it fails to pull an image from the acr.

as per kubectl describe po output:

Failed to pull image "xyz.azurecr.io/xyz:-beta-68": [rpc error: code = Unknown desc = Error response from daemon: Get https://xyz.azurecr.io/v2/: dial tcp: lookup rxyz.azurecr.io on [::1]:53: read udp [::1]:46256->[::1]:53: read: connection refused, rpc error: code = Unknown desc = Error response from daemon: Get https://xyz.azurecr.io/v2/: dial tcp: lookup xyz.azurecr.io on [::1]:53: read udp [::1]:46112->[::1]:53: read: connection refused, rpc error: code = Unknown desc = Error response from daemon: Get https://xyz.azurecr.io/v2/: dial tcp: lookup xyz.azurecr.io on [::1]:53: read udp [::1]:36677->[::1]:53: read: connection refused]

while troubleshooting i realised, few nodes has the dns entry in /etc/resolv.conf where image pull is working fine without issue and few node doesn't have the dns entry in /etc/resolv.conf where the image pull fails.

and if i manually add dns entry to /etc/resolv.conf on the nodes that doesn't have the entry; the changes are reverted to the initial state withing few minutes.

is there a procedure to edit /etc/resolv.conf or fix image pull issues.?

sanjeeth
  • 105
  • 4
  • 2
    There is an on-going issue with Canonical Ubuntu 18.04 leading to DNS errors. See https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119 - this looks related. You can either reboot the underlying VMs directly in Azure or follow the procedure in the link. – Jul_DW Aug 30 '22 at 13:27
  • We were also affected by this ^^ issue on all three of our AKS clusters today. – erewok Aug 30 '22 at 17:40
  • Bit me too, no images can pull. – Jeremy Morren Aug 30 '22 at 23:16
  • Stopping, then directly starting my VM worked. It was hung on rebooting for over 15 minutes. – Gavin Thomas Aug 31 '22 at 01:13
  • I wasn't able to find a way to restart the underlying VMSS of my AKS cluster through the web portal. Where can I do that? I can't tell if I'm looking at a wrong place or just don't have sufficient permissions to do that. – Martin Melka Aug 31 '22 at 08:27
  • Ended up scaling the nodepool up and then back down /shrug – Martin Melka Aug 31 '22 at 12:35

3 Answers3

2

There is a bug in ubuntu that impacts AKS (global). You can follow the link below to see the status. https://status.azure.com/en-us/status In addition, there is a thread here you can follow the suggestions to overcome this issue. https://learn.microsoft.com/en-us/answers/questions/987231/error-connecting-aks-with-acr.html

1

restart the cluster it will fix the problem ubuntu team have made some DNS issue so this problem started.