4

I am running a Kubernetes horizontal pod autoscaler to scale kafka consumers based on the consumer group lag. The HPA yaml file is shown below.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: kafka-consumer-application
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: kafka-consumer-application
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: External
      external:
        metric:
          name: kafka_consumergroup_lag
        target:
          type: AverageValue
          averageValue:5 

I observed that the HPA is scaling replicas not strictly according to the formula ceil(currentReplicas * currentMetricValue/desiredMetricValue ).

For instance, when the metric (consumer lag) was 108 with only one replica, Kubernetes scaled up only 4 replicas (as shown in the screen shot below), while theoretically it should scale to 10 (maximum replicas allowed)....

enter image description here

Any idea on the reason? am I missing something such as the maximim number of replicas that can be scaled/replicated per single iteration of the HPA reconciliation loop?

Please notice the message in the screenshot 'ScalingLimited True ScaleUpLimit the desired replica count is increasing faster than the maximum scale rate' what does it mean?

Thanks.

Mazen Ezzeddine
  • 662
  • 1
  • 8
  • 28
  • This sounds like not an HPA issue, but probably a metrics server issue. We'd need to see logs from metrics-server, and/or controller-manager logs. Did you take a look https://github.com/kubernetes/kubernetes/issues/56165#issuecomment-401105690 ? – Malgorzata Feb 09 '21 at 13:32
  • Can you replace the image with the actual command you ran and the actual output it produced? – David Maze Feb 09 '21 at 13:40
  • @Malgorzata What do you mean by a metrics server issue (I am using prometheus and prometheus adapter), and as it can be seen in the screen shot the HPA current metric value is 108 and the desired metric value is 5. So, looks like it is an HPA issue. Yes, I have taken a look at the issues you mentionned, and also at the calculateScaleUpLimit function in the HPA code, maybe the logic below justify the result above? (int32(math.Max(scaleUpLimitFactor*float64(currentReplicas), scaleUpLimitMinimum))) – Mazen Ezzeddine Feb 09 '21 at 13:46
  • @DavidMaze The methodology, commands and workflow are based on the following blogs https://medium.com/swlh/kafka-workers-autoscaling-with-horizontal-pod-autoscaler-e0f1d4dd6310 and https://medium.com/@ranrubin/horizontal-pod-autoscaling-hpa-triggered-by-kafka-event-f30fe99f3948 using GKE version 1.18.12-gke.1206 – Mazen Ezzeddine Feb 09 '21 at 13:49
  • No idea why this was downvoted; it's an important question. I think the `scaleUpLimitFactor` is probably to blame. I'm facing the same problem and noticed that the custom metrics scale up is limited by a factor of 2. – aaron Mar 23 '22 at 17:19

0 Answers0