i have upgraded an EKS cluster to 1.23, and - apparently - since then all the Gitlab runner Pods (K8S executor) don't really terminate, but stay in "Terminating" state seemingly forever. If I explicitly delete them using kubectl
they "leave".
Some observations & facts:
- the pods don't have volumes attached
- K9S shows the containers within the Pod in "ERROR" state, but gives no indication as to why.
- there is no finalizer attached (the one I see on the terminating Pods,
foregroundDeletion
, just seems to indicate that this Pod is being deleted right now - GitLab does not show errors, neither in the Pod logs nor in the UI; the CI jobs complete successfully
A lot of discussions hint to finalizers, but again: there are none attached, and also, as I understand it, only removing the finalizers would make them finally terminate and go, and that is not the behavior I see. I also found this SO question, but it has not helped me diagnosing or solving the issue.
I don't really know where to keep looking. Apart from GitLab runners everything seems to run just fine, but the configuration of them did not change.
Any help, here? Greatly appreciated.
UPDATE: I now realized that apparently the pod is gone on the underlying machine, at least there's no trace in docker ps
. Also this night two hosts rotated, and a bunch of Pods are still showing as being on those hosts (which don't exist any more).
Even more puzzled.
UPDATE II: managed to grab a /var/log/messages
log from the underlying host for a Terminated
container, see below.
# /var/log/messages | grep ...
Mar 8 09:30:45 ip-10-0-136-6 kubelet: I0308 09:30:45.823617 3693 kubelet.go:2120] "SyncLoop ADD" source="api" pods=[gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq]
Mar 8 09:30:45 ip-10-0-136-6 kubelet: I0308 09:30:45.946041 3693 reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"scripts\" (UniqueName: \"kubernetes.io/empty-dir/08eb951f-8b11-44b2-b106-42719ecff1ba-scripts\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:45 ip-10-0-136-6 kubelet: I0308 09:30:45.946155 3693 reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"aws-iam-token\" (UniqueName: \"kubernetes.io/projected/08eb951f-8b11-44b2-b106-42719ecff1ba-aws-iam-token\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:45 ip-10-0-136-6 kubelet: I0308 09:30:45.946217 3693 reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"docker-certs\" (UniqueName: \"kubernetes.io/empty-dir/08eb951f-8b11-44b2-b106-42719ecff1ba-docker-certs\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:45 ip-10-0-136-6 kubelet: I0308 09:30:45.946241 3693 reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"logs\" (UniqueName: \"kubernetes.io/empty-dir/08eb951f-8b11-44b2-b106-42719ecff1ba-logs\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:45 ip-10-0-136-6 kubelet: I0308 09:30:45.946279 3693 reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-2xzgj\" (UniqueName: \"kubernetes.io/projected/08eb951f-8b11-44b2-b106-42719ecff1ba-kube-api-access-2xzgj\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:45 ip-10-0-136-6 kubelet: I0308 09:30:45.946307 3693 reconciler.go:238] "operationExecutor.VerifyControllerAttachedVolume started for volume \"repo\" (UniqueName: \"kubernetes.io/empty-dir/08eb951f-8b11-44b2-b106-42719ecff1ba-repo\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:46 ip-10-0-136-6 kubelet: I0308 09:30:46.047362 3693 reconciler.go:293] "operationExecutor.MountVolume started for volume \"scripts\" (UniqueName: \"kubernetes.io/empty-dir/08eb951f-8b11-44b2-b106-42719ecff1ba-scripts\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:46 ip-10-0-136-6 kubelet: I0308 09:30:46.047460 3693 reconciler.go:293] "operationExecutor.MountVolume started for volume \"aws-iam-token\" (UniqueName: \"kubernetes.io/projected/08eb951f-8b11-44b2-b106-42719ecff1ba-aws-iam-token\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:46 ip-10-0-136-6 kubelet: I0308 09:30:46.047533 3693 reconciler.go:293] "operationExecutor.MountVolume started for volume \"docker-certs\" (UniqueName: \"kubernetes.io/empty-dir/08eb951f-8b11-44b2-b106-42719ecff1ba-docker-certs\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:46 ip-10-0-136-6 kubelet: I0308 09:30:46.047570 3693 reconciler.go:293] "operationExecutor.MountVolume started for volume \"logs\" (UniqueName: \"kubernetes.io/empty-dir/08eb951f-8b11-44b2-b106-42719ecff1ba-logs\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:46 ip-10-0-136-6 kubelet: I0308 09:30:46.047643 3693 reconciler.go:293] "operationExecutor.MountVolume started for volume \"kube-api-access-2xzgj\" (UniqueName: \"kubernetes.io/projected/08eb951f-8b11-44b2-b106-42719ecff1ba-kube-api-access-2xzgj\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:46 ip-10-0-136-6 kubelet: I0308 09:30:46.047699 3693 reconciler.go:293] "operationExecutor.MountVolume started for volume \"repo\" (UniqueName: \"kubernetes.io/empty-dir/08eb951f-8b11-44b2-b106-42719ecff1ba-repo\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:46 ip-10-0-136-6 kubelet: I0308 09:30:46.047815 3693 operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"scripts\" (UniqueName: \"kubernetes.io/empty-dir/08eb951f-8b11-44b2-b106-42719ecff1ba-scripts\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:46 ip-10-0-136-6 kubelet: I0308 09:30:46.048095 3693 operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"repo\" (UniqueName: \"kubernetes.io/empty-dir/08eb951f-8b11-44b2-b106-42719ecff1ba-repo\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:46 ip-10-0-136-6 kubelet: I0308 09:30:46.048277 3693 operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"logs\" (UniqueName: \"kubernetes.io/empty-dir/08eb951f-8b11-44b2-b106-42719ecff1ba-logs\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:46 ip-10-0-136-6 kubelet: I0308 09:30:46.050600 3693 operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"docker-certs\" (UniqueName: \"kubernetes.io/empty-dir/08eb951f-8b11-44b2-b106-42719ecff1ba-docker-certs\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:46 ip-10-0-136-6 kubelet: I0308 09:30:46.061048 3693 operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"aws-iam-token\" (UniqueName: \"kubernetes.io/projected/08eb951f-8b11-44b2-b106-42719ecff1ba-aws-iam-token\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:46 ip-10-0-136-6 kubelet: I0308 09:30:46.062031 3693 operation_generator.go:756] "MountVolume.SetUp succeeded for volume \"kube-api-access-2xzgj\" (UniqueName: \"kubernetes.io/projected/08eb951f-8b11-44b2-b106-42719ecff1ba-kube-api-access-2xzgj\") pod \"runner-szk7coz8-project-12345678-concurrent-0dwfcq\" (UID: \"08eb951f-8b11-44b2-b106-42719ecff1ba\") " pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:46 ip-10-0-136-6 kubelet: I0308 09:30:46.159234 3693 kuberuntime_manager.go:487] "No sandbox for pod can be found. Need to start a new one" pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq"
Mar 8 09:30:46 ip-10-0-136-6 kubelet: I0308 09:30:46.835414 3693 kubelet.go:2158] "SyncLoop (PLEG): event for pod" pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq" event=&{ID:08eb951f-8b11-44b2-b106-42719ecff1ba Type:ContainerDied Data:258f2671f11654204119270288364af695c332dc109c16e4eeb92ea8fed8a82c}
Mar 8 09:30:47 ip-10-0-136-6 kubelet: I0308 09:30:47.873037 3693 kubelet.go:2158] "SyncLoop (PLEG): event for pod" pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq" event=&{ID:08eb951f-8b11-44b2-b106-42719ecff1ba Type:ContainerStarted Data:258f2671f11654204119270288364af695c332dc109c16e4eeb92ea8fed8a82c}
Mar 8 09:30:47 ip-10-0-136-6 kubelet: I0308 09:30:47.873611 3693 kubelet.go:2158] "SyncLoop (PLEG): event for pod" pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq" event=&{ID:08eb951f-8b11-44b2-b106-42719ecff1ba Type:ContainerDied Data:98f1518e3d2f4ba7d66e1fd5238e49e928f81c5fe044e7bcbf408ec98ff7e45c}
Mar 8 09:30:48 ip-10-0-136-6 kubelet: I0308 09:30:48.926561 3693 kubelet.go:2158] "SyncLoop (PLEG): event for pod" pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq" event=&{ID:08eb951f-8b11-44b2-b106-42719ecff1ba Type:ContainerStarted Data:c3ffd5d9b1cf640b9af95efb271aeddc2ee1f96d84fa8e118d8ce778e33bbfdc}
Mar 8 09:30:48 ip-10-0-136-6 kubelet: I0308 09:30:48.926604 3693 kubelet.go:2158] "SyncLoop (PLEG): event for pod" pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq" event=&{ID:08eb951f-8b11-44b2-b106-42719ecff1ba Type:ContainerStarted Data:7a2c23f5a512d782af165b0a5d95dd55823202aef31cdc56998832a39b60017d}
Mar 8 09:37:34 ip-10-0-136-6 kubelet: I0308 09:37:34.802812 3693 kubelet.go:2136] "SyncLoop DELETE" source="api" pods=[gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq]
Mar 8 09:37:34 ip-10-0-136-6 kubelet: I0308 09:37:34.804708 3693 kuberuntime_container.go:723] "Killing container with a grace period" pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq" podUID=08eb951f-8b11-44b2-b106-42719ecff1ba containerName="build" containerID="docker://7a2c23f5a512d782af165b0a5d95dd55823202aef31cdc56998832a39b60017d" gracePeriod=1
Mar 8 09:37:34 ip-10-0-136-6 kubelet: I0308 09:37:34.805281 3693 kuberuntime_container.go:723] "Killing container with a grace period" pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq" podUID=08eb951f-8b11-44b2-b106-42719ecff1ba containerName="helper" containerID="docker://c3ffd5d9b1cf640b9af95efb271aeddc2ee1f96d84fa8e118d8ce778e33bbfdc" gracePeriod=1
Mar 8 09:37:36 ip-10-0-136-6 kubelet: I0308 09:37:36.705701 3693 kubelet.go:2158] "SyncLoop (PLEG): event for pod" pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq" event=&{ID:08eb951f-8b11-44b2-b106-42719ecff1ba Type:ContainerDied Data:c3ffd5d9b1cf640b9af95efb271aeddc2ee1f96d84fa8e118d8ce778e33bbfdc}
Mar 8 09:37:36 ip-10-0-136-6 kubelet: I0308 09:37:36.705852 3693 kubelet.go:2158] "SyncLoop (PLEG): event for pod" pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq" event=&{ID:08eb951f-8b11-44b2-b106-42719ecff1ba Type:ContainerDied Data:7a2c23f5a512d782af165b0a5d95dd55823202aef31cdc56998832a39b60017d}
Mar 8 09:37:36 ip-10-0-136-6 kubelet: I0308 09:37:36.705876 3693 kubelet.go:2158] "SyncLoop (PLEG): event for pod" pod="gitlab/runner-szk7coz8-project-12345678-concurrent-0dwfcq" event=&{ID:08eb951f-8b11-44b2-b106-42719ecff1ba Type:ContainerDied Data:258f2671f11654204119270288364af695c332dc109c16e4eeb92ea8fed8a82c}
Mar 8 09:37:41 ip-10-0-136-6 kubelet: I0308 09:37:41.984526 3693 kubelet.go:2158] "SyncLoop (PLEG): event for pod" pod="gitlab/runner-szk7co