HPA Scaling even though Current CPU is below Target CPU

Question

I am playing around with the Horizontal Pod Autoscaler in Kubernetes. I've set the HPA to start up new instances once the average CPU Utilization passes 35%. However this does not seem to work as expected. The HPA triggers a rescale even though the CPU Utilization is far below the defined target utilization. As seen below the "current" utilization is 10% which is far away from 35%. But still, it rescaled the number of pods from 5 to 6.

I've also checked the metrics in my Google Cloud Platform dashboard (the place at which we host the application). This also shows me that the requested CPU utilization hasn't surpassed the threshold of 35%. But still, several rescales occurred.

The content of my HPA

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
 name: django
spec:
{{ if eq .Values.env "prod" }}
 minReplicas: 5
 maxReplicas: 35
{{ else if eq .Values.env "staging" }}
 minReplicas: 1
 maxReplicas: 3
{{ end }}
 scaleTargetRef:
   apiVersion: apps/v1
   kind: Deployment
   name: django-app
 targetCPUUtilizationPercentage: 35

Does anyone know what the cause of this might be?

Did you specify any cpu/memory limits in your pods? That percentage is an avarage of all pods, but are bounded to the limits. If you specify a limit of "500m" for cpu, your 35% will be inside this "50%" limit. — Hector Vido, Mar 19 '21 at 04:18
Does this answer your question? [Kubernetes HPA wrong metrics?](https://stackoverflow.com/questions/60811207/kubernetes-hpa-wrong-metrics) — Alex G, Mar 19 '21 at 07:08
@HectorVido yes i've defined a limit for the pods. It currently is set to the following: `limits: cpu: 400m memory: 700Mi requests: cpu: 200m memory: 350Mi` I don't quite get what you mean by "If you specify a limit of "500m" for cpu, your 35% will be inside this "50%" limit." — Jeroen Beljaars, Mar 19 '21 at 12:45
If you specify `cpu: 400m` that means that your pod can only access 40% of a core. When you specify `targetCPUUtilizationPercentage: 35` you asking to scale when pod are consuming `140m` or 14% of a core. — Hector Vido, Mar 19 '21 at 13:15
@HectorVido After doing some additional research I think that I finally got it. The targetCPUUtilizationPercentage scales when the average CPU utilization of a deployment surpasses 35% it's configured "requests" value. So in the case of CPU requests: 200m an autoscale will trigger once it hits 70m or 7% of a core. — Jeroen Beljaars, Mar 20 '21 at 15:50
Yes man! I'll write an answer here just to clarify for other people. — Hector Vido, Mar 20 '21 at 20:10
If its one pod @HectorVido answer holds well. "TargetUtilizationPercentage" is a "mean" of all Pods in that namespace. — PraveenMak, Sep 12 '22 at 16:43

score 8 · Answer 1 · answered Apr 14 '22 at 21:26

Scaling is based on % of requests not limits. I think we should change this answer as the examples in the accepted answer show:

 limits:
   cpu: 1000m

But the targetCPUUtilizationPercentage is based on requests like:

requests:
   cpu: 1000m

For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod. If a target raw value is set, the raw metric values are used directly. The controller then takes the mean of the utilization or the raw value (depending on the type of target specified) across all targeted Pods, and produces a ratio used to scale the number of desired replicas.

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work

score 4 · Accepted Answer · edited Mar 27 '21 at 04:18

4

This is tricky and can be a bug, but I don't think so, most of time people configure too low values as I'll explain.

How `targetCPUUtilizationPercentage` relates to Pod's request limits.

The targetCPUUtilizationPercentage configures a percentage based on all the CPU a pod can use. On Kubernetes we can't create an HPA without specifying some limits to CPU usage.

Let's assume that this is our limits:

apiVersion: v1
kind: Pod
metadata:
  name: apache
spec:
  containers:
    - name: apache
      image: httpd:alpine
      resources:
        limits:
          cpu: 1000m

And in our targetCPUUtilizationPercentage inside HPA we specify 75%.

That is easy to explain because we ask for 100% (1000m = 1 CPU core) of a single core, so when this core is about 75% of use, HPA will start to work.

But if we define our limits as this:

spec:
  containers:
    - name: apache
      image: httpd:alpine
      resources:
        limits:
          cpu: 500m

Now, 100% of CPU our pod can utilize is only 50% of a single core. Fine, so 100% of cpu usage from this pod means, on hardware, 50% usage of a single core.

This is indifferent for targetCPUUtilizationPercentage, if we keep our value of 75% the HPA will start to work when our single core is about 37.5% usage, because this is 75% of all CPU this pod can consume.

From the perspective of a pod/hpa, they never know that they are limited on CPU or memory.

Understanding the scenario in the question above

With some programs like the one used in the question above - the CPU spikes do occur - however only in small timeframes (for example 10 second spikes). Due to the short duration of these spikes the metric server doesn't save this spike, but only saves the metric after a 1m window. In such cases the spike in between such windows will be excluded. This explains why the spike cannot be seen in the metrics dashboards, but is picked up by the HPA.

Thus, for services with low cpu limits a larger scale-up time window (scaleUp settings in HPA) can be ideal.

edited Mar 27 '21 at 04:18

Shabirmean

2,341
4
21
34

answered Mar 20 '21 at 20:27

Hector Vido

765
5
12

Thanks for your answer. I am starting to suspect that something is going wrong with the metrics. I've been testing it with a different setup. The HPA looks as following: https://snipboard.io/we892C.jpg the limits and requests have been setup as following: https://snipboard.io/AtsiLW.jpg. However when I look at the CPU utilization of the service it doesn't come close to the specified target https://snipboard.io/QTlfpr.jpg. What do you think? – Jeroen Beljaars Mar 21 '21 at 12:14
How are you testing (stressing) the application? But, you should look to `limits` not `requests`. The `requests` specify only a minimum necessary of a resource to a pod be scheduled on a node. – Hector Vido Mar 21 '21 at 12:54
The application is being used by clients. Also when I look at the Average CPU limit utilization of the pods, it's way below the target: https://snipboard.io/OiTFtc.jpg – Jeroen Beljaars Mar 21 '21 at 13:06
Okay, but can this mean that the application is working well and there is no reason to scale it up, or not? – Hector Vido Mar 21 '21 at 13:30
1

In this case I do not expect it to scale up, but yet it does trigger scale ups. The reason for implementing this is that we expect a growth in the amount of clients, and therefore want to be ready for this by being able to autoscale on traffic peaks. The problem now is that the HPA is scaling the application, even tough the CPU target is not being reached yet. Several autoscales occur every hour whilst the CPU levels are below target (sometimes even to 15 or more replicas). So the HPA at this moment is not working as I expected and I don't quite understand why it's triggering upscales. – Jeroen Beljaars Mar 21 '21 at 13:41
Script applications (Python, PHP, Perl) can have a lot of spikes just to load the framework files on each request, and the metrics server only keep the last metric in memory. Try to put a larger limit, like `1000m` and watch. Also, take a look on HPA `scaleUp` settigs, maybe you can define a larger window to scale. – Hector Vido Mar 21 '21 at 14:17
That explains it. So just to confirm: The CPU spikes do occur, however only in small timeframes (for example 10 second spikes). Due to the short duration of these spikes the metric server doesn't save this spike, but only saves the metric after 1m in which the spike will be excluded. Which would explain why the spike cannot be seen in the metrics, but is picked up by the HPA. Is this correct? – Jeroen Beljaars Mar 21 '21 at 15:44
Its is possible, yes. A larger time window in scaling up services with low cpu limits can help. – Hector Vido Mar 21 '21 at 16:16
Hi Hector, I made a script which gathers the CPU utilization metrics every 10 seconds. You were right, the CPU spikes occur within small timeframes (sometimes 10/20 seconds). That explains why it was not shown in the metrics. Thanks for thinking along! – Jeroen Beljaars Mar 22 '21 at 13:48
This seems wrong? This answers says it references 'limits' but the docs and other answers say 'requests' so which is it? – Chris Stryczynski Aug 19 '22 at 16:29

HPA Scaling even though Current CPU is below Target CPU

2 Answers2

How targetCPUUtilizationPercentage relates to Pod's request limits.

Understanding the scenario in the question above

How `targetCPUUtilizationPercentage` relates to Pod's request limits.