I've created a deployment which exposes a custom metric through an endpoint and an APIService that registers this custom metric, so I can use it in an HPA to autoscale the deployment. To achieve this, I've followed this tutorial.
It worked well while using an apiregistration.k8s.io/v1beta1 APIService. The metric was exposed correctly and the HPA could read it and scale accordingly. I've tried to update the APIService to version apiregistration.k8s.io/v1 (as v1beta1 is deprecated and removed in Kubernetes v1.22), but then the HPA couldn't pick the metric anymore, with this message:
Message
-------
unable to get metric threatmessages: Service on test services-metrics-service/unable to fetch
metrics from custom metrics API: the server is currently unable to handle the request
(get services.custom.metrics.k8s.io services-metrics-service)
If I manually request the metric, it exists though:
kubectl get --raw /apis/custom.metrics.k8s.io/v1/namespaces/test/services/services-metrics-service/threatmessages |jq .
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1",
"metadata": {
"selfLink": "custom.metrics.k8s.io/v1"
},
"items": [
{
"metricName": "threatmessages",
"timestamp": "2021-02-09T14:43:39.321Z",
"value": "0",
"describedObject": {
"kind": "Service",
"namespace": "test",
"name": "services-metrics-service",
"apiVersion": "/v1"
}
}
]
}
Here are my APIService and HPA resources:
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1.custom.metrics.k8s.io
spec:
insecureSkipTLSVerify: true
group: custom.metrics.k8s.io
groupPriorityMinimum: 1000
versionPriority: 5
service:
name: services-metrics-service
namespace: test
port: 443
version: v1
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: services-parallel-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: services-parallel-deployment
minReplicas: 1
maxReplicas: 10
metrics:
- type: Object
object:
describedObject:
kind: Service
name: services-metrics-service
metric:
name: threatmessages
target:
type: AverageValue
averageValue: 4k
behavior:
scaleDown:
stabilizationWindowSeconds: 30
policies:
- type: Pods
value: 1
periodSeconds: 30
What am I doing wrong? Or are these 2 versions just not compatible for some reason?