When one of the node goes down for a short time and immediately came back, the kubelet process starts and the Pods come back online. If the node is down for more than 5 mins, the Pods are terminated from that node.
If the Pod is a part of a ReplicaSet, then the Pod are recreated on the next available node.
For maintenance purposes, we can purpose;y drain the nodes so that the Pods are gracefully terminated and scheduled on the next available node before the main node goes down.
kubectl drain node-1
To mark a node be “unscheduleable”, we need to cordon it. Unlike drain, cordoning doesn’t terminate any running Pods. it just ensures that no new Pods will be scheduled onto the marked node.
kubectl cordon node-1
When the node is brought up again, it still won’t serve any of the Pods. We need to uncordon it first. Note that the Pods running on the other nodes will not automatically move to the new node. However, new Pods may be schedule onto the new node.
kubectl uncordon node-1
We can see the Kubernetes version when we retrieve the nodes:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready <none> 7h21m v1.11.3
worker1 Ready <none> 7h21m v1.11.3
worker2 Ready <none> 7h21m v1.11.3
The version number:
Note that Kubernetes only supports the three latest versions.
For managed Kubernetes clusters, they can be updated through the Management Console provider.
For Kubernetes cluster which are created through kubeadm, upgrading them involves two steps:
The worker nodes can be upgraded through:
All at once, which means the entire cluster is down and unable to serve any request.
One node at a time, which allows the workload to be moved to the other available nodes while the first node is being upgraded.
Add new nodes with the new software versions, which moves the workload to the new nodes and the remove the old node.
To see a cluster upgrade in action, check out this lab
To quickly backup all the resource configs, we can retrieve all the resources in all namespaces and forward their YAML output to a file.
kubectl get all --all-namespaces -o yaml > all-resources.yml
Another way is to ensure that all the manifests are safely stored in a Git repository.
To backup the etcd server, we can backup the data directory:
/var/lib/etcd
We can also make use of a built-in snapshot solution called etcdctl which will create snapshot file.
ETCDCTL_API=3 etcdctl \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key \
snapshot save /snapshots/backup.db
Note that the etdctl command requires the following to encrypt the etcd traffic:
To restore the etcd server from the snapshot:
ETCDCTL_API=3 etcdctl \
snapshot restore /snapshots/backup.db \
--data-dir /var/lib/etcd-from-backup
--initial-cluster master-1=https://192.168.3.10:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-advertise-peer-urls https://${INTERNAL_IP}:2380
We can also use another Docker -compatible CLI for containerd called nerdctl
sudo nerdctl run --rm \
-v '/snapshots:/snapshots' \
-v '/var/lib/etcd:/var/lib/etcd' \
-e ETCDCTL_API=3 \
'k8s.gcr.io/etcd:3.5.3-0' \
/bin/sh -c "etcdctl snapshot restore --data-dir /var/lib/etcd /snapshots/backup.db"
Make sure to reload the daemon and restart etcd.
systemctl daemon-reload
systemctl restart etcd
To see a backup and restore in action, check out this lab