5

microk8s kubectl describe pod mysql-deployment-756f9d8cdf-8kzdw

Notice the 11 minute age.

Events:
  Type    Reason          Age   From     Message
  ----    ------          ----  ----     -------
  Normal  SandboxChanged  11m   kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal  Pulled          11m   kubelet  Container image "mysql:5.7" already present on machine
  Normal  Created         11m   kubelet  Created container mysql-container
  Normal  Started         11m   kubelet  Started container mysql-container

microk8s get pods -o wide Notice the 41h and also the IP address changed about 11 minutes ago.

NAME                                        READY   STATUS    RESTARTS   AGE   IP             NODE                   NOMINATED NODE   READINESS GATES

mysql-deployment-756f9d8cdf-8kzdw           1/1     Running   3          41h   10.1.167.149   john-trx40-designare   <none>           <none>

microk8s kubectl logs mysql-deployment-756f9d8cdf-8kzdw

report a number of

2020-12-08T02:10:10.264100Z 32 [Note] Aborted connection 32 to db: 'jjg_script_db' user: 'root' host: '10.1.167.159' (Got an error reading communication packets)

Other pods report failing dhcp lookup, then crash and get recreated ..

It feels like IP lease is running out, but I would rather look at a log rather than speculate. Say this because sql pod age keeps increasing. The same exact sql image and data stay operational for months on docker-compose. Frequency is not tied to traffic.

sudo microk8s inspect produces lots of logs files, look at everyone but may have missed the critical events because of the large number of logs and not sure where to look.

Where do I look up logs for reason / trigger causing SandboxChanged? If my guess is correct, that its an issue with IP lease, where would I find IP allocation logs for microk8s kubernetes ?

My setup is a ubuntu 18.04 host with very little installed excepted for docker, docker-compose, git, visual studio code and microk8s kubernetes.

Everything recovers, but the interuption is anoying, and not knowing where to look is driving me nuts.


extra info requested by PjoterS

ingress

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: ingress-myservice-jjg
  annotations:
    # use the shared ingress-nginx
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/affinity-mode: "balanced" # "persistent"
    # added next 2 lines for secured https after I got a certificate
    certmanager.k8s.io/cluster-issuer: "letsencrypt-issuer"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  # added tls lines for secured https after I got a certificate
  tls:
    - hosts:
        - ancient-script.org
        - www.ancient-script.org
      secretName: ancient-script-org-crt-secret
  rules:
  - host: ancient-script.org
    http:
      paths:
      - path: /
        backend:
          serviceName: express-service
          servicePort: 3000
  - host: www.ancient-script.org
    http:
      paths:
      - path: /
        backend:
          serviceName: express-service
          servicePort: 3000

Deployment

apiVersion: v1
kind: Service
metadata:
  name: mysql-service
spec:
  ports:
  - protocol: TCP     # default is TCP
    port: 3306        # incoming port from within kubernetes
    targetPort: 3306  # default, port on the pod
  selector:
    app: mysql-pod
  clusterIP: None
---
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: mysql-deployment
spec:
  selector:
    matchLabels:
      app: mysql-pod
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: mysql-pod
    spec:
      containers:
      - image: mysql:5.7
        name: mysql-container
        args: 
        - --sql-mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: 'todochange'
        - name: MYSQL_DATABASE
          value: ancient_script_db
        ports:
        - containerPort: 3306
          name: mysql
        volumeMounts:
        - name: mysql-persistent-storage
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-persistent-storage
        persistentVolumeClaim:
          claimName: mysql-pv-claim

logs from MySQL about the same time the IP changed

2020-12-10T19:39:43.795268Z 44 [Note] Aborted connection 44 to db: 'ancient_script_db' user: 'root' host: '10.1.167.141' (Got an error reading communication packets)
2020-12-10T19:39:43.796587Z 43 [Note] Aborted connection 43 to db: 'ancient_script_db' user: 'root' host: '10.1.167.141' (Got an error reading communication packets)
2020-12-10T19:39:43.796761Z 41 [Note] Aborted connection 41 to db: 'ancient_script_db' user: 'root' host: '10.1.167.141' (Got an error reading communication packets)
2020-12-10T19:39:43.796831Z 38 [Note] Aborted connection 38 to db: 'ancient_script_db' user: 'root' host: '10.1.167.141' (Got an error reading communication packets)
2020-12-10T19:39:43.796889Z 42 [Note] Aborted connection 42 to db: 'ancient_script_db' user: 'root' host: '10.1.167.141' (Got an error reading communication packets)

from another incident, they occur twice daily:

Notice error message from microk8s when trying to get logs from another pods that uses MySQL, when microk9s finally allows logs, the logs show dhcp issus.

john@john-trx40-designare:~/Documents/GitHub/help-me-transcribe$ k describe pod express-deployment-64947b66b9-84vzc
Name:         express-deployment-64947b66b9-84vzc
Namespace:    default
Priority:     0
Node:         john-trx40-designare/99.153.71.9
Start Time:   Fri, 11 Dec 2020 12:27:13 -0600
Labels:       app=express-pod
              pod-template-hash=64947b66b9
Annotations:  cni.projectcalico.org/podIP: 10.1.167.153/32
              cni.projectcalico.org/podIPs: 10.1.167.153/32
Status:       Running
IP:           10.1.167.153
IPs:
  IP:           10.1.167.153
Controlled By:  ReplicaSet/express-deployment-64947b66b9
Containers:
  express:
    Container ID:   containerd://e9d178313638d3a8985caef57ae6e45f2b37b5ae08032e0eeba01c30a12676ce
    Image:          localhost:32000/express-server:20201211d
    Image ID:       localhost:32000/express-server@sha256:95fcc8679727820cff428f657ea7c32811681c2782e3692fbad0041ffcd3d935
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Fri, 11 Dec 2020 12:27:15 -0600
    Ready:          True
    Restart Count:  0
    Environment:
      mySQL_connection_limit:                  200
      mySQL_host:                              mysql-service
      mySQL_port:                              3306
      mySQL_user:                              root
      ROOT_PATH_user_files:                    /user_files
      ROOT_PATH_crop_images:                   /crop_images
      ROOT_PATH_bulk_input_AI_transcriptions:  /temp/FEB_2020_AI_transcription.txt #is this used?
      ROOT_PATH_CURRICULUM:                    /temp/curriculum/  #is this used?
      ROOT_PATH_transcribed_words:             /transcription_db/crops_sets/transcribed_words/ #is this used?
      ROOT_PATH_train_test:                    /transcription_db/train_test #is this used?
      IMAGE_SERVICE_ADDRESS:                   image-service
      TRANSCRIBE_SERVICE_ADDRESS:              transcription-service
      LOGIN_COOKIE_NAME:                       ancient_script_signed_login_token
    Mounts:
      /crop_images from name-ci (rw)
      /user_files from name-uf (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qnqqd (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  name-uf:
    Type:          HostPath (bare host directory volume)
    Path:          /mnt/disk2/Documents/help_me_transcribe/production/pv/user_files
    HostPathType:  
  name-ci:
    Type:          HostPath (bare host directory volume)
    Path:          /mnt/disk2/Documents/help_me_transcribe/production/pv/crop_images
    HostPathType:  
  default-token-qnqqd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-qnqqd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  28m   default-scheduler  Successfully assigned default/express-deployment-64947b66b9-84vzc to john-trx40-designare
  Normal  Pulling    28m   kubelet            Pulling image "localhost:32000/express-server:20201211d"
  Normal  Pulled     28m   kubelet            Successfully pulled image "localhost:32000/express-server:20201211d" in 446.977029ms
  Normal  Created    28m   kubelet            Created container express
  Normal  Started    28m   kubelet            Started container express
john@john-trx40-designare:~/Documents/GitHub/help-me-transcribe$ k logs express-deployment-64947b66b9-84vzc
Error from server (NotFound): the server could not find the requested resource ( pods/log express-deployment-64947b66b9-84vzc)

john@john-trx40-designare:~/Documents/GitHub/help-me-transcribe$ k logs express-deployment-64947b66b9-84vzc
process.env.mySQL_host = mysql-service
process.env.mySQL_port = 3306
process.env.mySQL_user = root
initializing socketApi.js
FOCUS
TODO, need to process all user files not just john_grabner
Scanning all files to make sure present in database ... this will take some time
Listening on port 3000 when launched native or from docker networks, maybe remapped to host in docker-compose
Unhandled Rejection at: Promise Promise {
  <rejected> { Error: getaddrinfo ENOTFOUND mysql-service mysql-service:3306
    at errnoException (dns.js:55:10)
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:97:26)
    --------------------
    at Protocol._enqueue (/app/src/help-me-transcribe-server/node_modules/mysql/lib/protocol/Protocol.js:145:48)
    at Protocol.handshake (/app/src/help-me-transcribe-server/node_modules/mysql/lib/protocol/Protocol.js:52:23)
    at PoolConnection.connect (/app/src/help-me-transcribe-server/node_modules/mysql/lib/Connection.js:130:18)
    at Pool.getConnection (/app/src/help-me-transcribe-server/node_modules/mysql/lib/Pool.js:48:16)
    at Promise (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:47:22)
    at new Promise (<anonymous>)
    at getConnection (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:42:16)
    at query (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:77:12)
    at query (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:74:16)
    at Object.query_one_or_null (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:101:12)
  code: 'ENOTFOUND',
  errno: 'ENOTFOUND',
  syscall: 'getaddrinfo',
  hostname: 'mysql-service',
  host: 'mysql-service',
  port: 3306,
  fatal: true } } reason: { Error: getaddrinfo ENOTFOUND mysql-service mysql-service:3306
    at errnoException (dns.js:55:10)
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:97:26)
    --------------------
    at Protocol._enqueue (/app/src/help-me-transcribe-server/node_modules/mysql/lib/protocol/Protocol.js:145:48)
    at Protocol.handshake (/app/src/help-me-transcribe-server/node_modules/mysql/lib/protocol/Protocol.js:52:23)
    at PoolConnection.connect (/app/src/help-me-transcribe-server/node_modules/mysql/lib/Connection.js:130:18)
    at Pool.getConnection (/app/src/help-me-transcribe-server/node_modules/mysql/lib/Pool.js:48:16)
    at Promise (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:47:22)
    at new Promise (<anonymous>)
    at getConnection (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:42:16)
    at query (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:77:12)
    at query (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:74:16)
    at Object.query_one_or_null (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:101:12)
  code: 'ENOTFOUND',
  errno: 'ENOTFOUND',
  syscall: 'getaddrinfo',
  hostname: 'mysql-service',
  host: 'mysql-service',
  port: 3306,
  fatal: true }

as per request, more info on mysql config

# microk8s kubectl get pv --sort-by=.spec.capacity.storage --namespace=production
apiVersion: v1
kind: PersistentVolume
metadata:
  # namespace: production
  name: mysql-pv-volume
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/Disk2/Documents/help_me_transcribe/production/pv/mysql-pv-volume"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  # namespace: production
  name: mysql-pv-claim
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi

# microk8s kubectl get services --namespace=production                          # List all services in the namespace
apiVersion: v1
kind: Service
metadata:
  # namespace: production
  name: mysql-service
spec:
  #type: xxxxxxx      # default, ClusterIP: Exposes the Service on a cluster-internal IP
                      # NodePort: Exposes the Service on each Node's IP at a static port (the NodePort, 30000-32767)
  ports:
  - protocol: TCP     # default is TCP
    port: 3306        # incoming port from within kubernetes
    targetPort: 3306  # default, port on the pod
    #nodePort: 33306
  selector:
    app: mysql-pod
  clusterIP: None
---
# microk8s kubectl get pods --namespace=production
# microk8s kubectl get pods -o wide --namespace=production                     # List all pods in the current namespace, with more details
                                                         # notice the IP address 10.1.167.xxx ... use this for "MySQL Workbench"
# microk8s kubectl describe pods my-pod --namespace=production
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  # namespace: production
  name: mysql-deployment
spec:
  selector:
    matchLabels:
      app: mysql-pod
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: mysql-pod
    spec:
      containers:
      - image: mysql:5.7
        name: mysql-container
        args: 
        - --sql-mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
        env:
        - name: MYSQL_DATABASE
          value: ancient_script_db
        ports:
        - containerPort: 3306
          name: mysql
        volumeMounts:
        - name: mysql-persistent-storage
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-persistent-storage
        # hostPath:
        #   path: "/Disk2/Documents/help_me_transcribe/production/pv/mysql-pv-volume"
        persistentVolumeClaim:
          claimName: mysql-pv-claim

Jan 5, 2021 Update

Something in microk8s must be eating the "microk8s kubectl get events" since this normaly reports "No resources found in default namespace.". Fortunately, just now captured an event that looks like taintManager is doing something.

microk8s kubectl get events
LAST SEEN   TYPE      REASON                    OBJECT                                          MESSAGE
49m         Normal    Starting                  node/john-trx40-designare                       Starting kube-proxy.
49m         Normal    Starting                  node/john-trx40-designare                       Starting kubelet.
49m         Warning   InvalidDiskCapacity       node/john-trx40-designare                       invalid capacity 0 on image filesystem
49m         Normal    NodeHasSufficientMemory   node/john-trx40-designare                       Node john-trx40-designare status is now: NodeHasSufficientMemory
49m         Normal    NodeHasNoDiskPressure     node/john-trx40-designare                       Node john-trx40-designare status is now: NodeHasNoDiskPressure
49m         Normal    NodeHasSufficientPID      node/john-trx40-designare                       Node john-trx40-designare status is now: NodeHasSufficientPID
49m         Normal    NodeNotReady              node/john-trx40-designare                       Node john-trx40-designare status is now: NodeNotReady
49m         Normal    TaintManagerEviction      pod/image-deployment-78c4c9fd7f-5vllb           Cancelling deletion of Pod default/image-deployment-78c4c9fd7f-5vllb
49m         Normal    TaintManagerEviction      pod/mysql-deployment-756f9d8cdf-lbfrg           Cancelling deletion of Pod default/mysql-deployment-756f9d8cdf-lbfrg
49m         Normal    TaintManagerEviction      pod/express-deployment-6dbb578fbb-dmjqs         Cancelling deletion of Pod default/express-deployment-6dbb578fbb-dmjqs
49m         Normal    TaintManagerEviction      pod/transcription-deployment-84fddcdff8-7sd9d   Cancelling deletion of Pod default/transcription-deployment-84fddcdff8-7sd9d
47m         Normal    NodeAllocatableEnforced   node/john-trx40-designare                       Updated Node Allocatable limit across pods
47m         Normal    NodeReady                 node/john-trx40-designare                       Node john-trx40-designare status is now: NodeReady
47m         Normal    SandboxChanged            pod/transcription-deployment-84fddcdff8-7sd9d   Pod sandbox changed, it will be killed and re-created.
47m         Normal    SandboxChanged            pod/image-deployment-78c4c9fd7f-5vllb           Pod sandbox changed, it will be killed and re-created.
47m         Normal    Pulled                    pod/transcription-deployment-84fddcdff8-7sd9d   Container image "localhost:32000/py_transcribe_service:20201226d" already present on machine
47m         Normal    Created                   pod/transcription-deployment-84fddcdff8-7sd9d   Created container transcription-container
47m         Normal    Started                   pod/transcription-deployment-84fddcdff8-7sd9d   Started container transcription-container
47m         Normal    SandboxChanged            pod/express-deployment-6dbb578fbb-dmjqs         Pod sandbox changed, it will be killed and re-created.
47m         Normal    SandboxChanged            pod/mysql-deployment-756f9d8cdf-lbfrg           Pod sandbox changed, it will be killed and re-created.
47m         Normal    CREATE                    ingress/ingress-myservice-jjg                   Ingress default/ingress-myservice-jjg
47m         Normal    UPDATE                    ingress/ingress-myservice-jjg                   Ingress default/ingress-myservice-jjg
47m         Normal    Pulled                    pod/image-deployment-78c4c9fd7f-5vllb           Container image "localhost:32000/py_image_service:2020126a" already present on machine
47m         Normal    Created                   pod/image-deployment-78c4c9fd7f-5vllb           Created container image-container
47m         Normal    Pulled                    pod/mysql-deployment-756f9d8cdf-lbfrg           Container image "mysql:5.7" already present on machine
47m         Normal    Created                   pod/mysql-deployment-756f9d8cdf-lbfrg           Created container mysql-container
47m         Normal    Started                   pod/image-deployment-78c4c9fd7f-5vllb           Started container image-container
47m         Normal    Started                   pod/mysql-deployment-756f9d8cdf-lbfrg           Started container mysql-container
47m         Normal    Pulled                    pod/express-deployment-6dbb578fbb-dmjqs         Container image "localhost:32000/express-server:20201226b" already present on machine
47m         Normal    Created                   pod/express-deployment-6dbb578fbb-dmjqs         Created container express
47m         Normal    Started                   pod/express-deployment-6dbb578fbb-dmjqs         Started container express

Next update:

inspection-report/snap.microk8s.daemon-controller-manager appears to contain a bunch of problems just before the nodes are restarted and get new IP address.

Does anyone know what this means?

Jan 05 09:01:27 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:27.587942 1704 cronjob_controller.go:123] Failed to extract job list: Get "https://127.0.0.1:16443/apis/batch/v1/jobs?limit=500": dial tcp 127.0.0.1:16443: connect: connection refused
Jan 05 09:01:27 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:27.854454 1704 leaderelection.go:325] error retrieving resource lock kube-system/kube-controller-manager: Get "https://127.0.0.1:16443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=10s": dial tcp 127.0.0.1:16443: connect: connection refused
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.286465 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.286510 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.286594 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.286889 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.286909 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.287010 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.288178 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.289698 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.289797 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.289898 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.293707 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.294074 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.294088 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.294092 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.294191 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.294221 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.294322 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:30 john-trx40-designare microk8s.daemon-controller-manager[1704]: E0105 09:01:30.294869 1704 reflector.go:138] k8s.io/client-go/metadata/metadatainformer/informer.go:90: Failed to watch *v1.PartialObjectMetadata: the server is currently unable to handle the request
Jan 05 09:01:35 john-trx40-designare microk8s.daemon-controller-manager[1704]: I0105 09:01:35.877899 1704 request.go:655] Throttling request took 1.048035134s, request: GET:https://127.0.0.1:16443/apis/discovery.k8s.io/v1beta1?timeout=32s
Jan 05 09:01:37 john-trx40-designare microk8s.daemon-controller-manager[1704]: W0105 09:01:37.179011 1704 garbagecollector.go:703] failed to discover some groups: map[metrics.k8s.io/v1beta1:the server is currently unable to handle the request]
Jan 05 09:01:38 john-trx40-designare microk8s.daemon-controller-manager[1704]: I0105 09:01:38.242607 1704 node_lifecycle_controller.go:1195] Controller detected that all Nodes are not-Ready. Entering master disruption mode.
Jan 05 09:01:38 john-trx40-designare microk8s.daemon-controller-manager[1704]: I0105 09:01:38.255936 1704 event.go:291] "Event occurred" object="kube-system/dashboard-metrics-scraper-6c4568dc68-pzj4s" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Cancelling deletion of Pod kube-system/dashboard-metrics-scraper-6c4568dc68-pzj4s"
Jan 05 09:01:38 john-trx40-designare microk8s.daemon-controller-manager[1704]: I0105 09:01:38.255960 1704 event.go:291] "Event occurred" object="kube-system/coredns-86f78bb79c-f5plg" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Cancelling deletion of Pod kube-system/coredns-86f78bb79c-f5plg"
Jan 05 09:01:38 john-trx40-designare microk8s.daemon-controller-manager[1704]: I0105 09:01:38.255974 1704 event.go:291] "Event occurred" object="kube-system/calico-kube-controllers-847c8c99d-hkgn6" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" 
aboger
  • 2,214
  • 6
  • 33
  • 47
grabbag
  • 980
  • 1
  • 15
  • 33
  • Could you share your `Deployment`, `service` (igress?) manifest? How you are exposing your application? Could you share more details about your environment? It's local or cloud? Could you also share some DB configuration details? Did you set any specific parameters like `max_allowed_packet`. Did you see [this thread](https://dba.stackexchange.com/questions/19135/mysql-error-reading-communication-packets)? – PjoterS Dec 09 '20 at 10:04
  • The environment is a desktop ubuntu 18.04. I installed microk8s as per instructions on their site. No VM running, not in cloud. – grabbag Dec 11 '20 at 15:29
  • It looks like issue is related with `MySQL` configuration? Could you share your DB configuration? Also, did you check [MySQL “Got an error reading communication packet”](https://www.percona.com/blog/2016/05/16/mysql-got-an-error-reading-communication-packet-errors/) article? – PjoterS Dec 16 '20 at 13:56
  • I think a big smoking gun. TaintManagerEviction even appeared near the time my pod got a new IP. Real odd, the even disappears after a while. Non of my pods use tolerate and I see no taint on my single pod. – grabbag Jan 05 '21 at 18:59
  • also created a ticket because smells like a bug https://github.com/ubuntu/microk8s/issues/1866 – grabbag Jan 06 '21 at 14:09
  • is this microk8s installed with calico ? could you look at the log of the calico pod ? should be a daemonset in kibe-system namespace – Althaf M Jan 06 '21 at 21:03
  • I followed instructions at https://microk8s.io/docs sudo snap install microk8s --classic --channel=latest/stable microk8s status --wait-ready microk8s enable dns storage registry host-access ingress – grabbag Jan 06 '21 at 21:41
  • microk8s kubectl get namespace NAME STATUS AGE kube-system Active 5d2h kube-public Active 5d2h kube-node-lease Active 5d2h default Active 5d2h container-registry Active 4d7h cert-manager Active 4d5h ingress Active 3d23h – grabbag Jan 06 '21 at 21:44
  • microk8s dumps all kinds of logs. can't place them in stackoverflow but did place them on https://github.com/ubuntu/microk8s/issues/1866 – grabbag Jan 06 '21 at 21:55
  • Did you check the kubelet logs? According to the docs the logs should be in `snap.microk8s.daemon-kubelet`. It could contain some more information. https://microk8s.io/docs/configuring-services. What about your container runtime - did you check the logs? Any restarts? What about the overlay network? Can you rule out that the machine is going into some kind of sleep mode? (can not tell if it is a bug, but it sounds like a problem between the container runtime and microk8s... good luck!) – cvoigt Jan 07 '21 at 22:07

1 Answers1

0

https://github.com/ubuntu/microk8s/issues/2241 provides a solution.

/var/snap/microk8s/2695/args$ nano kube-apiserver add line

--bind-address=0.0.0.0
grabbag
  • 980
  • 1
  • 15
  • 33