In this lab, we’ll see how Kubernetes supports CPU-based autoscaling based on defined custom metrics. We can do this by setting the target CPU percentage along with the minimum and maximum number of allowed replicas.
For collecting the metrics, we’ll use Metrics Server which is a solution maintained by Kubernetes. Once metrics server is running, the autoscaler can retireve the metrics collected and make calls using the Kuberenetes metrics API.
Here’s the architecture that we’ll use.
Before we start, let’s first verify if we’re using the correct IAM user’s access keys. This should be the user we created from the pre-requisites section above.
$ aws sts get-caller-identity
{
"UserId": "AIDxxxxxxxxxxxxxx",
"Account": "1234567890",
"Arn": "arn:aws:iam::1234567890:user/k8s-admin"
}
For the cluster, we can reuse the eksops.yml file from the previous labs.
Launch the cluster.
time eksctl create cluster -f eksops.yml
Check the nodes and pods.
kubectl get nodes
Save the cluster, region, and AWS account ID in a variable. We’ll be using these in a lot of the commands later.
MYREGION=ap-southeast-1
MYCLUSTER=eksops
MYAWSID=$(aws sts get-caller-identity | python3 -c "import sys,json; print (json.load(sys.stdin)['Account'])")
We’ll use namespace-autoscaling.yml to create autoscaling namespace.
apiVersion: v1
kind: Namespace
metadata:
name: autoscaling
labels:
app: counter
Apply.
kubectl apply -f namespace-autoscaling.yml
Verify.
$ kubectl get ns
NAME STATUS AGE
default Active 8h
autoscaling Active 18s
Download the metrics server v0.6.1.
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.1/components.yaml
It should return the following output:
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
Verify the state of deployment.
kubectl get deployment metrics-server -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 91s
We’ll use the same deployment-data.yml and deployment-support.yml](manifests/deployment-support.yml). from the previous lab, but we’ll specify the namespace as autoscaling.
Apply the changes.
kubectl apply -f deployment-data.yml -f deployment-support.yml
Let’s now take a look at our deployment-app.yml. We have defined the resource requests in the pod spec sections. Each pod will request 20 milli-CPU.
Apply.
kubectl apply -f deployment-app.yml
Check the pods.
$ kubectl get pods -n autoscaling
NAME READY STATUS RESTARTS AGE
app-tier-6ddbb5d4b7-fkrph 1/1 Running 0 10m
app-tier-6ddbb5d4b7-h627c 1/1 Running 0 10m
app-tier-6ddbb5d4b7-kvbv6 1/1 Running 0 10m
app-tier-6ddbb5d4b7-vkd85 1/1 Running 0 10m
app-tier-6ddbb5d4b7-xsrj5 1/1 Running 0 10m
data-tier-6c8f55b94f-vwqj6 1/1 Running 0 11m
support-tier-66f4cc4f7c-qkc5s 2/2 Running 0 5s
Verify the deployment.
$ kubectl get deployments app-tier -n autoscaling
NAME READY UP-TO-DATE AVAILABLE AGE
app-tier 5/5 5 5 2m57s
We’ll now use a Kubenetes resource called HorizontalPodScaler which scales in/out the Pods. In horizontal-autoscaler.yml, we’re setting the minimum to 3 replica and the maxium to 5 replicas.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: app-tier
namespace: autoscaling
labels:
app: microservices
tier: app
spec:
maxReplicas: 5
minReplicas: 3
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app-tier
targetCPUUtilizationPercentage: 70
We’ve also set the targetCPUUtilizationPercentage to 70% which means that if the average CPU utilization is 70% or higher, the autoscaler will increase the number of Pods until the CPU utilization is returned below the set target.
Apply.
kubectl apply -f horizontal-autoscaler.yml
We should now see 3 pods for the app-tier.
$ kubectl get deployments -n autoscaling
NAME READY UP-TO-DATE AVAILABLE AGE
app-tier 3/3 3 3 15m
data-tier 1/1 1 1 15m
support-tier 1/1 1 1 4m28s
$ kubectl get pods -n autoscaling
NAME READY STATUS RESTARTS AGE
app-tier-6ddbb5d4b7-4px9r 1/1 Running 0 19s
app-tier-6ddbb5d4b7-kvbv6 1/1 Running 0 14m
app-tier-6ddbb5d4b7-w7vpz 1/1 Running 0 19s
data-tier-6c8f55b94f-vwqj6 1/1 Running 0 15m
support-tier-66f4cc4f7c-qkc5s 2/2 Running 0 3m55s
To see how much utilization our app-tier pods is using, run the command below. Here we can see that it’s only using 3% and is well below the target 70% CPU threshold.
$ kubectl get hpa -n autoscaling
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
app-tier Deployment/app-tier 3%/70% 3 5 3 9m27s
To see more details about the HPA:
$ kubectl describe hpa -n autoscaling
Warning: autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler
Name: app-tier
Namespace: autoscaling
Labels: app=microservices
tier=app
Annotations: <none>
CreationTimestamp: Sat, 24 Sep 2022 07:37:36 +0800
Reference: Deployment/app-tier
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 3% (0) / 70%
Min replicas: 3
Max replicas: 5
Deployment pods: 3 current / 3 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 2m19s horizontal-pod-autoscaler New size: 1; reason: All metrics below target
Normal SuccessfulRescale 109s horizontal-pod-autoscaler New size: 3; reason: Current number of replicas below Spec.MinReplicas
Head on to the next lab to see how rolling updates and rollbacks works.