0

This is very strange today, I used AWS EKS cluster, and it works fine for my HPA yesterday and today morning. Starting afternoon, nothing change, my HPA suddenly not work!!

This is my HPA:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my_hpa_name
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my_deployment_name
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Pods
      pods:
        metric:
          name: my_metrics # MUST match the metrics on custom_metrics API
        target:
          type: AverageValue
          averageValue: 5
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30 # window to consider waiting while scaling Up. default is 0s if empty.
    scaleDown:
      stabilizationWindowSeconds: 300 # window to consider waiting while scaling down. default is 300s if empty.

And, when I start the testing, I did many tries, but all fails:

NAME                        REFERENCE                                   TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
xxxx-hpa   Deployment/xxxx-deployment   <unknown>/5   1         10        0          5s
xxxx-hpa   Deployment/xxxx-deployment   0/5           1         10        1          16s
xxxx-hpa   Deployment/xxxx-deployment   10/5          1         10        1          3m4s
xxxx-hpa   Deployment/xxxx-deployment   9/5           1         10        1          7m38s
xxxx-hpa   Deployment/xxxx-deployment   10/5          1         10        1          8m9s

You can see the replicas above never increase!

When I describe my HPA it says no events about the scale up, but the current value is > my target but never scale up!!!

Name:                         hpa_name
Namespace:                    default
Labels:                       <none>
Annotations:                  kubectl.kubernetes.io/last-applied-configuration:
                                {"apiVersion":"autoscaling/v2beta2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"hpa_name","name...
CreationTimestamp:            Thu, 04 Mar 2021 20:28:40 -0800
Reference:                    Deployment/my_deployment
Metrics:                      ( current / target )
  "plex_queue_size" on pods:  10 / 5
Min replicas:                 1
Max replicas:                 10
Deployment pods:              1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from pods metric my_metrics
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:           <none>

What's wrong with this?

Is it possible something wrong with the EKS cluster???

Edit:

  1. Checked the official document: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details

within a globally-configurable tolerance, from the --horizontal-pod-autoscaler-tolerance flag, which defaults to 0.1 I think even my metric is 6/5, it will still go scale up since its greater than 1.0

  1. I clearly saw my HPA works before, this is some evidence it works 2 days ago:
NAME           REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa   Deployment/my-deployment   0/5       1         10        1          26s
my-hpa   Deployment/my-deployment   0/5       1         10        1          46s
my-hpa   Deployment/my-deployment   8/5       1         10        1          6m21s
my-hpa   Deployment/my-deployment   8/5       1         10        2          6m36s
my-hpa   Deployment/my-deployment   8/5       1         10        2          6m52s
my-hpa   Deployment/my-deployment   8/5       1         10        4          7m7s
my-hpa   Deployment/my-deployment   7/5       1         10        4          7m38s
my-hpa   Deployment/my-deployment   6750m/5   1         10        6          7m55s

But now, it doesn't work. I have try to spin up new HPA for other metrics, it works. Just this one. Strange...


New Edit: Is is possible due to the EKS cluster, as I see this:

kubectl get nodes
NAME                                           STATUS                     ROLES    AGE   VERSION
ip-172-27-177-146.us-west-2.compute.internal   Ready                      <none>   14h   v1.18.9-eks-d1db3c
ip-172-27-183-31.us-west-2.compute.internal    Ready,SchedulingDisabled   <none>   15h   v1.18.9-eks-d1db3c

SchedulingDisabled is it means cluster not enough for the new pods?

Tianbing Leng
  • 560
  • 1
  • 8
  • 22
  • By [the formula in the HPA documentation](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details), there must be at least a queue size of 10 to scale up from 1 pod to 2. Is it consistently staying above that level for long enough for HPA to notice? – David Maze Mar 05 '21 at 11:57
  • if it is 6/5, it still will go up. See the same document above: within a globally-configurable tolerance, from the --horizontal-pod-autoscaler-tolerance flag, which defaults to 0.1 – Tianbing Leng Mar 05 '21 at 17:33
  • ...you're right, it's `ceil[...]` and not `floor[...]`. So 6/5 is 1.2, is above the tolerance threshold, which rounds up to 2. I'm...not sure why it's not scaling then. – David Maze Mar 05 '21 at 18:20
  • Is you ip-172-27-183-31.us-west-2.compute.internal a master node ? Can check logs of this node and describe it? In case it is not a master can you enable scheduling? – Malgorzata Mar 08 '21 at 08:48
  • Checked with DevOps, we have two autoscaling group for the ndoes. group1 is on demand group, min 1 node, max 2 node. groud2 is Spot group, min 1 node, max 2 node. If there is no any activity, there should be only one node there. – Tianbing Leng Mar 08 '21 at 17:06

2 Answers2

2

Figured out. It was the EKS cluster issue. I have a limit of resources of max 2 nodes of on-demand nodes and max 2 nodes on spot. Need to increase the cluster node.

Tianbing Leng
  • 560
  • 1
  • 8
  • 22
0

One thing that comes to mind is that your metrics-server might not be running correctly. Without data from the metrics-server, Horizontal Pod Autoscaling won't work.

Fritz Duchardt
  • 11,026
  • 4
  • 41
  • 60