We’ll be creating dedicated nodegroups with autoscaling enabled:
Then, we’ll deploy the autoscaler.
Finally, we’ll create the nginx deployment
For this lab, we’ll be using ap-southeast-1 region (Singapore).
Let’s check this manifest.
We can see that there’s four nodegroups here:
ng-dover This is a simple nodegroup which will consist of 3 nodes.
scale-clementi-1a This nodegroup will be confined to the ap-southeast-1a AZ.
scale-clementi-1a This nodegroup will be confined to the ap-southeast-1b AZ.
scale-spot This nodegroup will span three AZs.
Create the cluster.
$ eksctl create cluster -f green-line.yml
We can also save the cluster name in a variable since we’ll be using it multiple times in this lab.
$ export MYCLUSTER=green-line
To verify the nodes,
$ kubectl get nodes
Inspect the nodegroups.
$ eksctl get nodegroup --cluster $MYCLUSTER
We can do some parsing to display only the important columns:
$ eksctl get nodegroup --cluster green-line | awk '{if (NR!=1) {print $1 " " $2 " " $5 " " $6 " " $7 " " $8}}' | column -t
green-line ng-dover 3 3 3 m5.large
green-line scale-clementi-1a 1 10 1 t3.large
green-line scale-clementi-1b 1 10 1 t3.large
green-line scale-spot 1 10 1 t3.large
Let’s now delete the first node group, ng-dover.
$ time eksctl delete nodegroup -f green-line.yml \
--include="ng-dover" \
--approve
Verify.
$ kubectl get nodes
$ eksctl get nodegroup --cluster $MYCLUSTER
As a brief explanation about the Cluster Autoscaler:
Cluster Autoscaler is a tool that automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true:
- there are pods that failed to run in the cluster due to insufficient resources.
- there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.
It is also recommended to use the Kubernetes control plane version for the Cluster Autoscaler. From our green-line.yml manifest,
metadata:
version: "1.22"
Check if the version release in the Kubernetes Cluster Autoscaler Github page. We’ll be using version 1.22.3.
We will use the available Cluster Autoscaler manifest from the Kubernetes Github repository.
</br>
Note that if you’ll use the same YAML file, you’ll need to edit line 147 and line 163 and replace
[line 147] - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.22.3
[line 163] --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/green-line
Let’s now deploy the cluster autoscaler.
$ kubectl apply -f cluster-autoscaler-autodiscover.yml
It should return the following output.
serviceaccount/cluster-autoscaler created
clusterrole.rbac.authorization.k8s.io/cluster-autoscaler created
role.rbac.authorization.k8s.io/cluster-autoscaler created
clusterrolebinding.rbac.authorization.k8s.io/cluster-autoscaler created
rolebinding.rbac.authorization.k8s.io/cluster-autoscaler created
deployment.apps/cluster-autoscaler created
We need to annotate the deployment as well. Annotations are used to store data external to Kubernetes. This usually consists of machine-generated data, and can even be stored in JSON form.
In our setup, annotating the deployment ensures that the autoscaler does not evict its own pods.
$ kubectl -n kube-system annotate deployment.apps/cluster-autoscaler \
cluster-autoscaler.kubernetes.io/safe-to-evict="false" \
--overwrite
As a final step, let’s edit the deployment using VIM. Make sure to replace the following:
$ kubectl -n kube-system edit deployment.apps/cluster-autoscaler
In VIM, you could enter “:set nu” to display the line numbers. Edit line 50 to 52. Once you save and exit, the deployment will automatically be updated. Verify:
$ kubectl -n kube-system describe deployment cluster-autoscaler
Verify the pod is running.
$ kubectl apply -f cluster-autoscaler-autodiscover.yml
If it doesn’t display the cluster autoscaler, you can also run:
$ kubectl get pods -A
Time to see the cluster autoscaler in action. To do this, we’ll create an NGINX deployment and scale it. Let’s check the manifest.
</br>
We’ll start the deployment with only 1 replica, meaning just 1 instance. We’ll use the NGINX image and we’ll also limit the resources. The container will only be launched if the nodes receive requests that consumes 500m CPUs.
Lastly, recall the labels we set for the spot instances nodegroup in green-line.yml.
</br>
In our NGINX manifest, we’re telling EKS to deploy the pods only to the nodes that has “instance-type: spot”.
Let’s now apply this manifest.
$ kubectl apply -f nginx.yml
Check the pods. It will first go through ContainerCreating stage before changing to Running.
$ kubectl get pods
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-autoscaler-8458c98b94-ntztw 1/1 Running 0 35s
nginx-autoscaler-8458c98b94-vqrjl 1/1 Running 0 35s
To check all the pods that running,
$ kubectl get pods -A
This should spin up a single spot instance. To verify,
$ kubectl get nodes -l instance-type=spot
Previously, we only have 2 pods running in a single spot instance.
$ kubectl get pods
$ kubectl get nodes -l instance-type=spot
Let’s try to scale the NGINX autoscaler to 3.
$ kubectl scale --replicas=3 deployment/nginx-autoscaler
Verify.
$ kubectl get pods
Now, let’s try to scale it to 6 pods.
$ kubectl scale --replicas=6 deployment/nginx-autoscaler
We’ll see that the new pod will get stuck in Pending state. This is because the autoscaler is launching a second spot instance that will run the 6th pod.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-autoscaler-8458c98b94-c7jv8 1/1 Running 0 46s
nginx-autoscaler-8458c98b94-hcpgr 0/1 Pending 0 4s
nginx-autoscaler-8458c98b94-khmf8 1/1 Running 0 16s
nginx-autoscaler-8458c98b94-ntztw 1/1 Running 0 12m
nginx-autoscaler-8458c98b94-rgsjf 1/1 Running 0 5m6s
nginx-autoscaler-8458c98b94-vqrjl 1/1 Running 0 12m
We only have 1 spot instance previously. If we check again,
$ kubectl get nodes -l instance-type=spot
NAME STATUS ROLES AGE VERSION
ip-192-168-12-178.ap-southeast-1.compute.internal Ready <none> 2m12s v1.22.12-eks-ba74326
ip-192-168-43-200.ap-southeast-1.compute.internal Ready <none> 164m v1.22.12-eks-ba74326
Now if we see the pods again, all pods are now in Running state.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-autoscaler-8458c98b94-hcpgr 1/1 Running 0 5m26s
nginx-autoscaler-8458c98b94-jz22k 1/1 Running 0 6s
nginx-autoscaler-8458c98b94-lc2qp 1/1 Running 0 5s
nginx-autoscaler-8458c98b94-q564z 1/1 Running 0 6s
nginx-autoscaler-8458c98b94-vvlt5 1/1 Running 0 5s
nginx-autoscaler-8458c98b94-z4nj9 1/1 Running 0 6s
The same way works when we reduce the replicas. It will terminate some pods, and then reduce the number of spot instances needed.
Let’s first scale to 20 replicas. This should spin up additional spot instances.
$ kubectl scale --replicas=1 deployment/nginx-autoscaler
$ kubectl get pods
$ kubectl get nodes -l instance-type=spot
NAME STATUS ROLES AGE VERSION
ip-192-168-12-178.ap-southeast-1.compute.internal Ready <none> 8m25s v1.22.12-eks-ba74326
ip-192-168-43-200.ap-southeast-1.compute.internal Ready <none> 171m v1.22.12-eks-ba74326
ip-192-168-74-36.ap-southeast-1.compute.internal Ready <none> 78s v1.22.12-eks-ba74326
ip-192-168-8-54.ap-southeast-1.compute.internal Ready <none> 82s v1.22.12-eks-ba74326
Now let’s bring down the replicas to just 1. We’ll see the pods terminating.
$ kubectl scale --replicas=1 deployment/nginx-autoscaler
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-autoscaler-8458c98b94-9nh9m 1/1 Terminating 0 2m48s
nginx-autoscaler-8458c98b94-gdmsd 1/1 Running 0 2m48s
nginx-autoscaler-8458c98b94-zkd4s 1/1 Terminating 0 2m48s
The default cooldown for spot instances is 5 minutes. This means it will take around 5 minutes before the spot instances are reduced to minimum needed to handle the pods.
$ kubectl get nodes -l instance-type=spot
To check the logs of the cluster autoscaler:
$ kubectl -n kube-system logs deployment.apps/cluster-autoscaler
We can filter the output to only show the scaling events:
$ kubectl -n kube-system logs deployment.apps/cluster-autoscaler | grep -A5 "Expanding"
This would return:
I0824 15:27:44.672038 1 waste.go:57] Expanding Node Group eksctl-green-line-nodegroup-scale-spot-NodeGroup-TIP9P0PV6I19 would waste 85.00% CPU, 93.48% Memory, 89.24% Blended
I0824 15:27:44.672062 1 scale_up.go:468] Best option to resize: eksctl-green-line-nodegroup-scale-spot-NodeGroup-TIP9P0PV6I19
I0824 15:27:44.672072 1 scale_up.go:472] Estimated 1 nodes needed in eksctl-green-line-nodegroup-scale-spot-NodeGroup-TIP9P0PV6I19
I0824 15:27:44.672093 1 scale_up.go:586] Final scale-up plan: [{eksctl-green-line-nodegroup-scale-spot-NodeGroup-TIP9P0PV6I19 1->2 (max: 10)}]
I0824 15:27:44.672114 1 scale_up.go:675] Scale-up: setting group eksctl-green-line-nodegroup-scale-spot-NodeGroup-TIP9P0PV6I19 size to 2
I0824 15:27:44.672164 1 auto_scaling_groups.go:219] Setting asg eksctl-green-line-nodegroup-scale-spot-NodeGroup-TIP9P0PV6I19 size to 2
--
I0824 15:34:47.328761 1 waste.go:57] Expanding Node Group eksctl-green-line-nodegroup-scale-spot-NodeGroup-TIP9P0PV6I19 would waste 32.50% CPU, 70.68% Memory, 51.59% Blended
I0824 15:34:47.328785 1 scale_up.go:468] Best option to resize: eksctl-green-line-nodegroup-scale-spot-NodeGroup-TIP9P0PV6I19
I0824 15:34:47.328798 1 scale_up.go:472] Estimated 2 nodes needed in eksctl-green-line-nodegroup-scale-spot-NodeGroup-TIP9P0PV6I19
I0824 15:34:47.328821 1 scale_up.go:586] Final scale-up plan: [{eksctl-green-line-nodegroup-scale-spot-NodeGroup-TIP9P0PV6I19 2->4 (max: 10)}]
I0824 15:34:47.328842 1 scale_up.go:675] Scale-up: setting group eksctl-green-line-nodegroup-scale-spot-NodeGroup-TIP9P0PV6I19 size to 4
I0824 15:34:47.328873 1 auto_scaling_groups.go:219] Setting asg eksctl-green-line-nodegroup-scale-spot-NodeGroup-TIP9P0PV6I19 size to 4
Make sure to delete the cluster after the lab to save costs.
$ time eksctl delete cluster -f green-line.yml
When you delete your cluster, make sure to double check the AWS Console and check the Cloudformation stacks (which we created by eksctl) are dropped cleanly.