0

I own a GKE Cluster on GCP, I have 1 node pool with 1 node (4 CPU/16Gb RAM).

Today I tried to scale one of my application to 10 replicas (We want to run lots of concurrent requests on it).

I first edited my horizontalPodAutoscaler.yaml and changed maxReplicas from 5 to 50 and minReplicas from 1 to 10.

Then I edited deployment.yaml and modified spec.replicas from 3 to 10.

Now my deployment is stuck in a loop: It tries to deploy the 10 pods, and as soon as the 10 are ready, it kills 5 of them to go back to 5, in an infinite loop.

Here a the screenshots of the state of the Autoscaler during the loop, it's like it tries to apply 1 configuration and immeditalety the configuration get overwritten by the other.

enter image description here

enter image description here

Here are the config files I am using:

horizontalPodScheduler.yaml

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  labels:
    app: my-app
    env: production
  name: my-app-hpa
  namespace: production
spec:
  maxReplicas: 50
  metrics:
    - resource:
        name: cpu
        targetAverageUtilization: 80
      type: Resource
  minReplicas: 10
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: my-app
    env: production
  name: my-app
  namespace: production
spec:
  replicas: 10
  selector:
    matchLabels:
      app: my-app
      env: production
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: my-app
        env: production
    spec:
      nodeSelector:
        cloud.google.com/gke-nodepool: my-pool
      containers:
        - image: gcr.io/my_project_id/github.com/my_org/my-app
          imagePullPolicy: IfNotPresent
          name: my-app-1
          resources:
            requests:
              cpu: "50m"
David Maze
  • 130,717
  • 29
  • 175
  • 215
FairPluto
  • 697
  • 6
  • 28
  • I have repliacated your environment, but unable to replicate the issue. In my environment when I change the number of replicas it autoscales to 10 and stays there without exhibiting the issue you are describing. I will continue to investigate this issue – Gabriel Robledo Ahumada Oct 27 '21 at 15:03
  • Since this issue requires to take a look into the project's logs to further troubleshoot, I recommend to open a direct support case in GCP, here you can find the documentation on how to do it [1] [1] https://cloud.google.com/support/docs/manage-cases – Gabriel Robledo Ahumada Oct 29 '21 at 14:24
  • Can you post the results of `kubectl get hpa my-app-hpa` and/or `kubectl desc hpa my-app-hpa`? – Gari Singh Nov 02 '21 at 10:21

0 Answers0