How does multiple replicas/pods scale Kubernetes?

Question

From what I understand, using multiple replicas as well as auto-scaling is supposed to help in the case that lots of people visit your website and make calls to services provided by your Kubernetes cluster.

How do the replicas help with scaling?

Aren't these extra pods all just running on the same computer with constant resources?
That would mean that they're all limited by a constant amount of CPU and memory.

After posting the question, I remembered that each pod is given a specific amount of resources. I assumed that you'd set each pod to have somewhere between 20-40% of a computer's resources, but that doesn't have to be the case. — abrarisme, Jan 26 '18 at 07:05

score 2 · Accepted Answer · answered Jan 26 '18 at 08:04

Kubernetes has couple of scaling mechanisms. Horizontal Pod Autoscaler being the basic, but not the only one.

With HPA you can spin up additional PODs according to some metrics (most commonly cpu and memory). At some point you will hit a moment when your cluster nodes do not have enough resources to satisfy resource requirements of your pods (you will have pods in Pending state due to lack of nodes available for scheduling).

At that point a Cluster Autoscaler can kick in and ie. scale AWS ASG (or some other cloud-ish node pool) to add new node to the cluster and make space for the pending pod(s)

How does multiple replicas/pods scale Kubernetes?

1 Answers1