k8s pods not able to retry when attach volume timeout

Question

Sometimes I got a bunch of jobs to launch, and each of them mounts a pvc. As our resource is limited, some pods fail to mount in less than one minute.

Unable to mount volumes for pod "package-job-120348968617328640-5gv7s_vname(b059856a-ecfa-11ea-a226-fa163e205547)": timeout expired waiting for volumes to attach or mount for pod "vname"/"package-job-120348968617328640-5gv7s". list of unmounted volumes=[tmp]. list of unattached volumes=[log tmp].

And it sure keeps retrying. But it never success (event age is like 44s (x11 over 23m)). But if I delete this pod, this job will create a new pod and it will complete.

So why is this happening? Shouldn't pod retry mount automatically instead of needing manual intervention? And if this is not avoidable, is there a workaround that it will automatically delete pods in Init Phase more than 2 min?

Conclusion

It's actually the attaching script provided by my cloud provider in some of the nodes stucks (caused by a network problem). So If others run into these problem, maybe checking storage plugin that attaches disks is a good idea.

score 2 · Accepted Answer · answered Sep 02 '20 at 15:45

2

So why is this happening? Shouldn't pod retry mount automatically instead of needing manual intervention? And if this is not avoidable, is there a workaround that it will automatically delete pods in Init Phase more than 2 min?

There can be multiple reasons to this. Do you have any Events on the Pod if you do kubectl describe pod <podname>? And do you reuse the PVC that another Pod used before?

I guess that you use a regional cluster, consisting of multiple datacenters (Availability Zones) and that your PVC is located in one AZ but your Pod is scheduled to run in a different AZ? In such situation, the Pod will never be able to mount the volume since it is located in another AZ.

answered Sep 02 '20 at 15:45

Jonas

121,568
97
310
388

There is only one event, and I reused the PVC, and I checked we just have one AZ (I sure hope that this is the reason but it's not – Nick Allen Sep 03 '20 at 02:10
What kind of storage system do you use? Are the disks attached to the Nodes directly - so that Pods on other Nodes can not use them? – Jonas Sep 03 '20 at 04:21
Thanks for your tips, next time this problem occurs I will see whether new pods will schedule to the same node as the old one but still success. – Nick Allen Sep 03 '20 at 07:23
Or are two pods using the same PVC at the same time? If you are using accessMode ReadWriteOnce, a volume can only be mounted at one Node at a time. – Jonas Sep 03 '20 at 08:49
No, there is only one pod. – Nick Allen Sep 03 '20 at 09:21
I think your theory is correct. I got three nodes, and if pods scheduled to node1, all is smooth, but if they are scheduled to the other two nodes they will 100% fail to mount pvc. Strangely these three nodes belong to the same AZ, I contacted my cloud provider's support and waiting for confirmation. – Nick Allen Sep 07 '20 at 02:23
I just confirmed it is the bug of my cloud provider that preventing some of my nodes from mounting pvc，thanks for pointing me in the right direction. – Nick Allen Sep 08 '20 at 01:58

score 0 · Answer 2 · answered Oct 28 '20 at 13:46

0

I had same problem, when even volume attached to same node where pod is running.

I ssh into node and restarted kubelet then it fixed the issue.

answered Oct 28 '20 at 13:46

Thamaraiselvam

6,961
8
45
71

k8s pods not able to retry when attach volume timeout

Conclusion

2 Answers2