Kubernetes Core Concepts: Horizontal Pod Autoscaling (autoscaling/v2) with Metrics

adil
4 min readNov 9, 2023

Part 1: Kubernetes Core Concepts: Kube State Metrics and Metrics Server

One of the advantages of having a metrics server is the ability to scale a pod horizontally.

Photo by Nathan Dumlao on Unsplash

See: Terminology Confusion: Horizontal/Vertical Partitioning, Scaling, Sharding

Let’s deploy a pod and try to scale it via the metrics server:

00-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: web-debug
spec:
replicas: 1
selector:
matchLabels:
app: web-debug
template:
metadata:
labels:
app: web-debug
spec:
containers:
- image: ailhan/web-debug
name: web-debug

Apply:

➜  ~ kubectl apply -f 00-deployment.yaml
deployment.apps/web-debug created

I installed the metrics server. (More details are in Part 1)

Assume memory usage exceeds 50%, create a new pod for web-debug deployment

01-hpa-memory.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-debug-scaling
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-debug
minReplicas: 1
maxReplicas: 4
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 50

Apply:

➜  ~ kubectl apply -f 01-hpa-memory.yaml
horizontalpodautoscaler.autoscaling/web-debug-scaling created

Let’s check the installation:

The web-debug deployment and Horizontal Pod Autoscaling (hpa) rule created.

However, the HPA rule for the web-debug deployment has an unknown value. Let’s examine the rule:

The HPA gives the following error:

failed to get memory utilization: missing request for memory in container

Kubernetes cannot initiate the scaling procedure because it cannot know the minimum memory guaranteed for the deployment.

We will need to set the minimum guaranteed memory for the deployment:

02-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: web-debug
spec:
replicas: 1
selector:
matchLabels:
app: web-debug
template:
metadata:
labels:
app: web-debug
spec:
containers:
- image: ailhan/web-debug
name: web-debug
resources:
requests:
memory: "40Mi"

Apply:

➜  ~ kubectl apply -f 02-deployment.yaml
deployment.apps/web-debug configured

Let’s take a look at the deployment:

40 MiB of memory will be requested from Kubernetes. If 50% of the requested memory is used, Kubernetes will create a new pod for the deployment.

Now, Let’s take a look at the HPA rule:

The unknown value is gone.

Let’s increase memory usage:

Let’s take a look at the memory usage and number of allocated pods:

The memory usage of the container where I run the stress-ng command is very high. Because the memory usage is higher than the target memory usage (50%), Kubernetes creates three additional pods.

Horizontal Scaling based on CPU Usage

03-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: web-debug
spec:
replicas: 1
selector:
matchLabels:
app: web-debug
template:
metadata:
labels:
app: web-debug
spec:
containers:
- image: ailhan/web-debug
name: web-debug
resources:
requests:
memory: "40Mi"
cpu: "100m"

Apply:

➜  ~ kubectl apply -f 03-deployment.yaml
deployment.apps/web-debug configured

Let’s take a look at the deployment:

04-hpa-memory-cpu.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-debug-scaling
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-debug
minReplicas: 1
maxReplicas: 4
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60

Apply:

➜  ~ kubectl apply -f 04-hpa-memory-cpu.yaml
horizontalpodautoscaler.autoscaling/web-debug-scaling configured

Kubernetes is aware of memory and CPU limits.

Let’s check the configuration:

Let’s increase the CPU usage and see the results:

Three more pods were created due to high CPU usage:

--

--