Kubernetes Spare/Cold Replica/Pod

Question

I am looking for how to have a spare/cold replica/pod in my Kubernetes configuration. I assume it would go in my Kuberentes deployment or HPA configuration. Any idea how I would make it so I have 2 spare/cold instances of my app always ready, but only get put into the active pods once HPA requests another instance? My goal is to have basically zero startup time on a new pod when HPA says it needs another instance.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: someName
  namespace: someNamespace
  labels:
    app: someName
    version: "someVersion"
spec:
  replicas: $REPLICAS
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: someMaxSurge
      maxUnavailable: someMaxUnavailable
  selector:
    matchLabels:
      app: someName
      version: someVersion
  template:
    metadata:
      labels:
        app: someName
        version: "someVersion"
    spec:
      containers:
      - name: someName
        image: someImage:someVersion
        imagePullPolicy: Always
        resources:
          limits:
            memory: someMemory
            cpu: someCPU
          requests:
            memory: someMemory
            cpu: someCPU
        readinessProbe:
          failureThreshold: someFailureThreshold
          initialDelaySeconds: someInitialDelaySeconds
          periodSeconds: somePeriodSeconds
          timeoutSeconds: someTimeoutSeconds
        livenessProbe:
          httpGet:
            path: somePath
            port: somePort
          failureThreshold: someFailureThreshold
          initialDelaySeconds: someInitialDelay
          periodSeconds: somePeriodSeconds
          timeoutSeconds: someTimeoutSeocnds
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: someName
  namespace: someNamespace
spec:
  minAvailable: someMinAvailable
  selector:
    matchLabels:
      app: someName
      version: "someVersion"       
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: someName-hpa
  namespace: someNamespace
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: someName
  minReplicas: someMinReplicas
  maxReplicas: someMaxReplicas
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: someAverageUtilization

What is the use case for this? Couldn't you just scale up earlier? Alternatively have more replicas as base. Why can't they be "active"? — Jonas, Mar 30 '21 at 14:24
your right, they could be active... Just a different way of thinking about it... Basically I am just wanting to always have 2 spare for scaling, or if one becomes unavailable or any reason (like health checks). I just dont want to wait when I need more or one becomes unavailable. I know I can always over scale, but i was thinking about scaling to what i need instead of over scaling. — Brian, Mar 30 '21 at 14:48

score 2 · Answer 1 · answered Mar 30 '21 at 19:10

I am just wanting to always have 2 spare for scaling, or if one becomes unavailable or any reason

It is a good practice to have at least two replicas for services on Kubernetes. This helps if e.g. a node goes down or you need to do maintenance of the node. Also set Pod Topology Spread Constraints so that those pods are scheduled to run on different nodes.

Set the number of replicas that you minimum want as desired state. In Kubernetes, traffic will be load balanced to the replicas. Also use Horizontal Pod Autoscaler to define when you want to autoscale to more replicas. You can set the requirements low for autoscaling, if you want to scale up early.

Kubernetes Spare/Cold Replica/Pod

1 Answers1