We are running Google Kubernetes Engine (version 1.18.17-gke.1200) with Workload Identity enabled. There are 6 nodes in the cluster. On two of the nodes the gke-metadata-server
Pod is failing with this error:
Metadata Server stopped unexpectedly: failed to prepare Metadata Server: listen tcp 0.0.0.0:988: bind: address already in use
Running netstat -tn
on the problematic node we see this:
tcp 0 0 10.0.11.218:988 172.16.1.10:2049 ESTABLISHED -
10.0.11.218 is the IP address of the node while 172.16.1.10 is the address of a Google Filestore instance.
My guess is that some other Pod on this node is connecting to Google Filestore (i.e., NFS) and using a low-numbered port which, unluckily, turned out to be 988 (the GKE Metadata Service running on each node uses port 988).
Is there some way to tell GKE or the Pod to not use port 988 when connecting to NFS?