Kubernetes Core Concepts: Horizontal Pod Autoscaling (autoscaling/v2) with Metrics
Part 1: Kubernetes Core Concepts: Kube State Metrics and Metrics Server
One of the advantages of having a metrics server is the ability to scale a pod horizontally.
See: Terminology Confusion: Horizontal/Vertical Partitioning, Scaling, Sharding
Let’s deploy a pod and try to scale it via the metrics server:
00-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-debug
spec:
replicas: 1
selector:
matchLabels:
app: web-debug
template:
metadata:
labels:
app: web-debug
spec:
containers:
- image: ailhan/web-debug
name: web-debug
Apply:
➜ ~ kubectl apply -f 00-deployment.yaml
deployment.apps/web-debug created
I installed the metrics server. (More details are in Part 1)
Assume memory usage exceeds 50%, create a new pod for web-debug deployment
01-hpa-memory.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-debug-scaling
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-debug
minReplicas: 1
maxReplicas: 4
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 50
Apply:
➜ ~ kubectl apply -f 01-hpa-memory.yaml
horizontalpodautoscaler.autoscaling/web-debug-scaling created
Let’s check the installation:
The web-debug deployment and Horizontal Pod Autoscaling (hpa) rule created.
However, the HPA rule for the web-debug deployment has an unknown value. Let’s examine the rule:
The HPA gives the following error:
failed to get memory utilization: missing request for memory in container
Kubernetes cannot initiate the scaling procedure because it cannot know the minimum memory guaranteed for the deployment.
We will need to set the minimum guaranteed memory for the deployment:
02-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-debug
spec:
replicas: 1
selector:
matchLabels:
app: web-debug
template:
metadata:
labels:
app: web-debug
spec:
containers:
- image: ailhan/web-debug
name: web-debug
resources:
requests:
memory: "40Mi"
Apply:
➜ ~ kubectl apply -f 02-deployment.yaml
deployment.apps/web-debug configured
Let’s take a look at the deployment:
40 MiB of memory will be requested from Kubernetes. If 50% of the requested memory is used, Kubernetes will create a new pod for the deployment.
Now, Let’s take a look at the HPA rule:
The unknown value is gone.
Let’s increase memory usage:
Let’s take a look at the memory usage and number of allocated pods:
The memory usage of the container where I run the stress-ng command is very high. Because the memory usage is higher than the target memory usage (50%), Kubernetes creates three additional pods.
Horizontal Scaling based on CPU Usage
03-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-debug
spec:
replicas: 1
selector:
matchLabels:
app: web-debug
template:
metadata:
labels:
app: web-debug
spec:
containers:
- image: ailhan/web-debug
name: web-debug
resources:
requests:
memory: "40Mi"
cpu: "100m"
Apply:
➜ ~ kubectl apply -f 03-deployment.yaml
deployment.apps/web-debug configured
Let’s take a look at the deployment:
04-hpa-memory-cpu.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-debug-scaling
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-debug
minReplicas: 1
maxReplicas: 4
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
Apply:
➜ ~ kubectl apply -f 04-hpa-memory-cpu.yaml
horizontalpodautoscaler.autoscaling/web-debug-scaling configured
Kubernetes is aware of memory and CPU limits.
Let’s check the configuration:
Let’s increase the CPU usage and see the results:
Three more pods were created due to high CPU usage: