K8s cluster restart and recovery - Node node start and stop method

1 Application scenarios

Scene：

In actual work, a Worker node may need to be maintained
Migrate, we need to smoothly stop and start the node
The impact on clusters and services during start-stop should be minimized

Notice：

Exclude Worker node operation
The workload (Pod) on the Worker node will be evicted to other nodes
Please make sure the cluster resources are sufficient

2 Operation steps

2.1 Stop Worker node scheduling

# View informationroot@sh-gpu091:~# kubectl get node
NAME                 STATUS     ROLES   AGE    VERSION
172.19.13.31         Ready      node    403d   v1.14.1
   Ready      node    403d   v1.14.1
   Ready      node    403d   v1.14.1
   Ready      node    403d   v1.14.1
   Ready      node    403d   v1.14.1
# Stop Worker node schedulingroot@sh-gpu091:~# kubectl cordon 
node/ cordoned

Check the node status

root@sh-gpu091:~# kubectl get node
NAME                 STATUS                     ROLES   AGE    VERSION
172.19.13.31         Ready                      node    403d   v1.14.1
   Ready                      node    403d   v1.14.1
   Ready                      node    403d   v1.14.1
   Ready                      node    403d   v1.14.1
   Ready,SchedulingDisabled   node    403d   v1.14.1

2.2 Evicting Workloads on Worker Nodes

# --ignore-daemonsets Ignore daemonset when expelling pods# --delete-local-data Delete the temporary data of the pod when expelling a pod. This parameter will not delete the persistent data.root@sh-gpu091:~# kubectl drain  --delete-local-data --ignore-daemonsets --force
node/ already cordoned
WARNING: ignoring DaemonSet-managed Pods: cattle-system/cattle-node-agent-8wcvs, kube-system/kube-flannel-ds-kqzhc, kube-system/nvidia-device-plugin-daemonset-rr2lf, monitoring/prometheus-node-exporter-xtbxp
evicting pod "model-server-0"
evicting pod "singleview-proxy-client-pbdownloader-0"
evicting pod "singleview-proxy-service-0"
pod/singleview-proxy-client-pbdownloader-0 evicted
pod/singleview-proxy-service-0 evicted
pod/model-server-0 evicted
node/ evicted

2.3 Stop Docker, Kubelet and other services

systemctl stop kubelet 
systemctl stop docker

Check whether there is still a pod on the node

kubectl get pod -A -o wide |grep

If recovery is not required, you can delete the node and confirm the node information

root@sh-gpu091:~# kubectl delete node 
node "" deleted
root@sh-gpu091:~# kubectl get node
NAME                 STATUS     ROLES   AGE    VERSION
172.19.13.31         Ready      node    403d   v1.14.1
   Ready      node    403d   v1.14.1
   Ready      node    403d   v1.14.1
   Ready      node    403d   v1.14.1
root@sh-gpu091:~#

2.4 Recover Worker nodes

systemctl start docker
systemctl status docker
systemctl start kubelet
systemctl status kubelet

2.5 Allow Worker node scheduling

# Cancel the non-scheduledkubectl uncordon

Summarize

The above is personal experience. I hope you can give you a reference and I hope you can support me more.