0

I have a k8s cluster deployed on GKE, there is one "main" nodepool containing 1 node, for all the deployments and one nodepool containing 1 node for kube-ip.

On the main nodepool, I would like to deploy 10 replicas of one of my application (flask API). However GKE is contantly killing my pods when the number exceed 5 to get the pod number back to 5.

I tried to modify the values in my differents yaml files (deployment.yaml and horizontalPodScheduler.yaml)

horizontalPodScheduler.yaml

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  labels:
    app: my-app
    env: production
  name: my-app-hpa
  namespace: production
spec:
  maxReplicas: 20
  metrics:
    - resource:
        name: cpu
        targetAverageUtilization: 80
      type: Resource
  minReplicas: 10
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: my-app
    env: production
  name: my-app
  namespace: production
spec:
  replicas: 10
  selector:
    matchLabels:
      app: my-app
      env: production
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: my-app
        env: production
    spec:
      nodeSelector:
        cloud.google.com/gke-nodepool: my-pool
      containers:
        - image: gcr.io/my_project_id/github.com/my_org/my-app
          imagePullPolicy: IfNotPresent
          name: my-app-1

Even when I set those values, GKE is always overwritting them to 5 replicas:

Here is the resources summary for my main node, you can see there is plenty of resources to deploy the replicas (it's a pretty simple API)

I also tried to use the "scale" button on GKE UI, but results are the same...

resources

FairPluto
  • 697
  • 6
  • 28
  • hope you are not runnin the argo CD or anything is overwriting the files again and again or changing it. – Harsh Manvar Oct 26 '21 at 09:56
  • I don't think I use argo cd – FairPluto Oct 26 '21 at 09:59
  • 1
    Very obvious but have you increased your GCP Compute Engine quota? – DUDANF Oct 26 '21 at 10:07
  • I can't see how it could help, the node is already running and there is enough resources on it to schedule at least 6 pods, and max I can have is still 5 – FairPluto Oct 26 '21 at 10:12
  • 1
    Could it be that since it is an auto-scaler, it is scaling down. Could you possibly do multiple concurrent calls to your flask API so GKE can "scale up" - I haven't used GKE in a while, but definitely have a look at your GCP quotas. Compute Engine/GKE both! – DUDANF Oct 26 '21 at 11:16
  • Yes but I specified the minReplicas to be at least 10, I agree it is load balanced but it should be balanced between the limits I set : min 10 and max 20 – FairPluto Oct 26 '21 at 13:05
  • Is there anything in logs or events ? Does `gcloud container operations list` show anything related ? – mario Oct 26 '21 at 15:51

0 Answers0