Kubernetes Pod Scheduling Mechanism: From Theory to Production Practical Guide
As the "traffic commander" of Kubernetes cluster, the Pod scheduling mechanism directly affects the stability of the application and resource utilization. This article will analyze the working principles of the scheduler in depth, and combine production practical experience to share configuration solutions that can be directly implemented.
1. Core working principle of the scheduler
Scheduler (kube-scheduler)It is the intelligent scheduling center of the cluster, which mainly completes two key decisions:
- Filtering: Filter out candidate nodes that meet the basic requirements from all nodes in the cluster
- Scoring: Multi-dimensional scoring of candidate nodes and select the optimal node
2. Core scheduling strategy for production environment
1. Resource scheduling (the basics in the basics)
apiVersion: v1 kind: Pod metadata: name: web-server spec: containers: - name: nginx image: nginx:1.21 resources: requests: memory: "512Mi" cpu: "500m" limits: memory: "1Gi" cpu: "1"
⚠️ Production experience:
- Requests must be set, otherwise the scheduler will not be able to determine whether the node resources are sufficient.
- It is recommended that limits be set no more than 80% of the available resources of the node to prevent resource exhaustion.
- Use Vertical Pod Autoscaler to automatically adjust resource parameters
2. Affinity Scheduling (Affinity)
Scenario case: Deploy the cache service and the database in the same Availability Zone
affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - mysql topologyKey: /zone
3. Taints & Tolerations
Typical Applications:
- Dedicated GPU node:
gpu=true:NoSchedule
- Edge node:
edge=true:NoExecute
tolerations: - key: "gpu" operator: "Exists" effect: "NoSchedule"
4. Topological Distribution Constraint (PodTopologySpread)
topologySpreadConstraints: - maxSkew: 1 topologyKey: /zone whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app: frontend
3. Advanced scheduling practical skills
1. Priority and preemption
apiVersion: scheduling./v1 kind: PriorityClass metadata: name: high-priority value: 1000000 description: "Critical Business Priority"
⚠️ Notes:
- Use preemption function with caution may cause service interruption
- It is recommended to set system components (such as CNI plug-ins) to high priority
2. Scheduler performance optimization
apiVersion: ./v1beta3 kind: KubeSchedulerConfiguration profiles: - schedulerName: default-scheduler percentageOfNodesToScore: 70 # Control the sampling ratio of nodes pluginConfig: - name: NodeResourcesFit args: scoringStrategy: type: LeastAllocated # Select nodes with low resource utilization
3. Multi-scheduler collaboration
apiVersion: v1 kind: Pod metadata: name: ai-job spec: schedulerName: batch-scheduler # Specify a dedicated scheduler
4. Production environment troubleshooting guide
View scheduled events:
kubectl describe pod <pod-name> | grep -A 10 Events
Common reasons for scheduling failure:
- Insufficient CPU/Memory (Insufficient resources)
- No nodes available (node selector does not match)
- Pod has unbound immediate PersistentVolumeClaims
- Taint toleration not matched
Recommended diagnostic tools:
-
kube-scheduler
Log (need to adjust the log level to 4+) - Scheduler Framework Visual Plugin
- use
kubectl get pods -o wide
Check the actual dispatch node
5. Suggestions on Evolution of Scheduling Strategy
- Early stage: Basic scheduling based on resource requests
- Development stage: Introducing affinity and topological constraints
-
Maturity stage:
- Implement multi-dimensional scheduling strategy combination
- Develop custom dispatch plug-ins
- Introducing machine learning predictive scheduling
Best practice: Use
kubectl apply --dry-run=server
Verify the configuration and test the scheduling robustness through chaos engineering.
By rationally applying these scheduling strategies, an e-commerce platform has successfully increased resource utilization from 35% to 68%, and at the same time increased the uniformity of service deployment across availability zones by 90%. By mastering these core mechanisms, you will be able to build a more efficient and stable Kubernetes cluster.
This is all about this article about the K8s Pod scheduling mechanism. For more related K8s Pod scheduling content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!