4

Since yesterday, I am facing a weird error on K8s ( using GKE)

I have a deployment with 1 pod running. I delete deployment and it used to terminate the pod and the replicaset with it.

But now, if I delete the deployment, the replicaset does not get deleted and thus the pod keeps running.

Does anyone else have this issue? Or know of a way to resolve this?

Should I bring down replica count of my deployment to 0 before deleting it? OR some other solution?

I am using v1.15.9-gke.24

dummy example reproduced

dummy_deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dummy-deployment
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      name: dummy
  template:
    metadata:
      labels:
        name: dummy
    spec:
      serviceAccountName: dummy-serviceaccount
      nodeSelector:
        cloud.google.com/gke-nodepool : default-pool
      containers:
      - name: pause
        image: gcr.io/google_containers/pause
        resources:
          limits:
            memory: 100M
          requests:
            cpu: 100m
            memory: 100M

dummy_serviceaccount.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: dummy-serviceaccount
  namespace: default

Commands I run

kubectl apply -f dummy_serviceaccount.yaml
kubectl apply -f dummy_deployment.yaml
kubectl -n default get pods | grep dummy
kubectl delete deployments dummy-deployment
kubectl -n default get pods | grep dummy
kubectl -n default get replicasets | grep dummy

INTERESTING OBSERVATION

deployment.extensions "dummy-deployment" deleted

deployment.apps/dummy-deployment created

When creating a new deployment using kubectl apply, deployment.apps gets created. But when deleting a deployment using kubectl delete, deployment.extensions gets deleted.

NO LOGS created from kubectl get events immediately after deleting deployment using kubectl -n default delete deployment dummy-deployment

LOGS FROM kubectl get events immediately after creating the deployment

2m24s       Normal   Scheduled           pod/dummy-deployment-69946b945f-txvvr          Successfully assigned default/dummy-deployment-69946b945f-txvvr to gke-XXX-default-pool-c7779722-7j9x
2m23s       Normal   Pulling             pod/dummy-deployment-69946b945f-txvvr          Pulling image "gcr.io/google_containers/pause"
2m22s       Normal   Pulled              pod/dummy-deployment-69946b945f-txvvr          Successfully pulled image "gcr.io/google_containers/pause"
2m22s       Normal   Created             pod/dummy-deployment-69946b945f-txvvr          Created container pause
2m22s       Normal   Started             pod/dummy-deployment-69946b945f-txvvr          Started container pause
2m24s       Normal   SuccessfulCreate    replicaset/dummy-deployment-69946b945f         Created pod: dummy-deployment-69946b945f-txvvr
2m24s       Normal   ScalingReplicaSet   deployment/dummy-deployment                    Scaled up replica set dummy-deployment-69946b945f to 1

kubectl -n default get pods | grep dummy

BEFORE : empty

AFTER:

kubectl -n default get pods | grep dummy 
dummy-deployment-69946b945f-txvvr 1/1 Running 0 6s 

kubectl -n default get replicasets | grep dummy

BEFORE : empty

AFTER:

kubectl -n default get replicasets | grep dummy
dummy-deployment-69946b945f 1 1 1 12s

UPDATE on Jan 29, 2021 Here is the code I run to delete the pod fully

def delete_deployment(namespace, notebook_id):
    command1 = f"kubectl -n {namespace} get deployments | grep {notebook_id} | sed 's/  */ /g' | cut -d' ' -f1 | xargs kubectl -n {namespace} delete deployments"
    command2 = f"kubectl -n {namespace} get replicasets | grep {notebook_id} | sed 's/  */ /g' | cut -d' ' -f1 | xargs kubectl -n {namespace} delete replicasets"
    command3 = f"kubectl -n {namespace} get pods | grep {notebook_id} | sed 's/  */ /g' | cut -d' ' -f1 | xargs kubectl -n {namespace} delete pods"

    command_final = f"{command1};{command2};{command3}"
    #command_final = command1

    try:
        subprocess.run(command_final, shell=True)
    except:
        return False
    
    return True
crossvalidator
  • 437
  • 6
  • 12
  • let's start from the top. can you reproduce this issue? i mean, delete whatever replicasets you have remaining, and then verify you don't have any pod running. now create the deployment, and delete it. if this issue still reproduces, please supply the deployment yaml, and the **exact** commands you are issuing by order, and i'll try to reproduce this on my cluster with the same k8s version. – omricoco May 19 '20 at 19:48
  • too many dependent files. That deployment uses a service account, lots of K8s secrets, a persistent disk and a priority class. It'd be hard for you to reproduce the exact environment in your cluster. Thanks! – crossvalidator May 19 '20 at 23:33
  • ok, I reproduced it using a simpler deployment. – crossvalidator May 19 '20 at 23:49
  • please post the output of these commands **before and after** you apply anything: ```kubectl -n default get pods | grep dummy kubectl -n default get replicasets | grep dummy``` – omricoco May 20 '20 at 07:04
  • Ive tried to reproduce this issue on new cluster but was not able to. In your one command you have typo. It should be `dummy-deployment` instead of `dummy_deployment`. Are you able to gather any logs from pod? Any specific output form `kubectl get events`? It happens all the time? Are you getting any error if you would manually delete `Replicaset`? – PjoterS May 20 '20 at 11:47
  • Thanks for catching the type. Fixed it. – crossvalidator May 20 '20 at 12:11
  • I have added more logs in the body of the question. Thank you for your help so far!! – crossvalidator May 20 '20 at 12:38
  • It looks like the deployment object is completely disconnected from replicasets and pods when deleting. I am getting around this by deleting deployment, replicaset and pod in that order in one custom function. But delete deployment was working just fine until 2 days ago. – crossvalidator May 20 '20 at 12:41
  • when creating a deployment, ```deployment.apps``` gets created. when deleting a deployment, ```deployment.extensions``` get deleted. Don;t know if that's expected behavior. – crossvalidator May 20 '20 at 12:47
  • Are you still able to reproduce this scenario? What if you will try to do the same on new cluster with the same version? – PjoterS May 28 '20 at 16:34
  • It's still happening on my cluster. I created a new node pool and it still has the same issue. Honestly don't have time to create a new cluster at this point. I wrote a function to delete deployment, replicaset and pod in that order. – crossvalidator May 28 '20 at 18:53
  • Could you also provide oudput of: `kubectl get deploy,pods,rs,rc` `kubectl describe pod ` `kubectl describe rs ` – PjoterS Jun 03 '20 at 13:23
  • Thank you fro following up. It looks like GKE resolved the issue on its own. Since the past couple of days, deployment deletes have been working normally. STRANGE. May be it was some update GKE pushed that messed with the cluster and they may have fixed it. – crossvalidator Jun 04 '20 at 01:48
  • In my company we're facing the exact same issue. And we don't know why this is happening, because it worked so long. Do you still have the commands to manually delete the according replica set? – NFoerster Jan 29 '21 at 07:15
  • Hi @Thorgas I have added my code in the OP – crossvalidator Jan 29 '21 at 16:33

0 Answers0