Detailed explanation of K8s Pod scheduling mechanism (from theory to practical guide for generation)

Kubernetes Pod Scheduling Mechanism: From Theory to Production Practical Guide

As the "traffic commander" of Kubernetes cluster, the Pod scheduling mechanism directly affects the stability of the application and resource utilization. This article will analyze the working principles of the scheduler in depth, and combine production practical experience to share configuration solutions that can be directly implemented.

1. Core working principle of the scheduler

Scheduler (kube-scheduler)It is the intelligent scheduling center of the cluster, which mainly completes two key decisions:

Filtering: Filter out candidate nodes that meet the basic requirements from all nodes in the cluster
Scoring: Multi-dimensional scoring of candidate nodes and select the optimal node

2. Core scheduling strategy for production environment

1. Resource scheduling (the basics in the basics)

apiVersion: v1
kind: Pod
metadata:
  name: web-server
spec:
  containers:
  - name: nginx
    image: nginx:1.21
    resources:
      requests:
        memory: "512Mi"
        cpu: "500m"
      limits:
        memory: "1Gi" 
        cpu: "1"

⚠️ Production experience:

Requests must be set, otherwise the scheduler will not be able to determine whether the node resources are sufficient.
It is recommended that limits be set no more than 80% of the available resources of the node to prevent resource exhaustion.
Use Vertical Pod Autoscaler to automatically adjust resource parameters

2. Affinity Scheduling (Affinity)

Scenario case: Deploy the cache service and the database in the same Availability Zone

affinity:
  podAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values:
          - mysql
      topologyKey: /zone

3. Taints & Tolerations

Typical Applications：

Dedicated GPU node:gpu=true:NoSchedule
Edge node:edge=true:NoExecute

tolerations:
- key: "gpu"
  operator: "Exists"
  effect: "NoSchedule"

4. Topological Distribution Constraint (PodTopologySpread)

topologySpreadConstraints:
- maxSkew: 1
  topologyKey: /zone
  whenUnsatisfiable: ScheduleAnyway
  labelSelector:
    matchLabels:
      app: frontend

3. Advanced scheduling practical skills

1. Priority and preemption

apiVersion: scheduling./v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
description: "Critical Business Priority"

⚠️ Notes:

Use preemption function with caution may cause service interruption
It is recommended to set system components (such as CNI plug-ins) to high priority

2. Scheduler performance optimization

apiVersion: ./v1beta3
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: default-scheduler
    percentageOfNodesToScore: 70  # Control the sampling ratio of nodes    pluginConfig:
      - name: NodeResourcesFit
        args:
          scoringStrategy: 
            type: LeastAllocated  # Select nodes with low resource utilization

3. Multi-scheduler collaboration

apiVersion: v1
kind: Pod
metadata:
  name: ai-job
spec:
  schedulerName: batch-scheduler  # Specify a dedicated scheduler

4. Production environment troubleshooting guide

View scheduled events：

kubectl describe pod <pod-name> | grep -A 10 Events

Common reasons for scheduling failure：

Insufficient CPU/Memory (Insufficient resources)
No nodes available (node selector does not match)
Pod has unbound immediate PersistentVolumeClaims
Taint toleration not matched

Recommended diagnostic tools：

kube-schedulerLog (need to adjust the log level to 4+)
Scheduler Framework Visual Plugin
usekubectl get pods -o wideCheck the actual dispatch node

5. Suggestions on Evolution of Scheduling Strategy

Early stage: Basic scheduling based on resource requests
Development stage: Introducing affinity and topological constraints
Maturity stage：
- Implement multi-dimensional scheduling strategy combination
- Develop custom dispatch plug-ins
- Introducing machine learning predictive scheduling

Best practice: Usekubectl apply --dry-run=serverVerify the configuration and test the scheduling robustness through chaos engineering.

By rationally applying these scheduling strategies, an e-commerce platform has successfully increased resource utilization from 35% to 68%, and at the same time increased the uniformity of service deployment across availability zones by 90%. By mastering these core mechanisms, you will be able to build a more efficient and stable Kubernetes cluster.

This is all about this article about the K8s Pod scheduling mechanism. For more related K8s Pod scheduling content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!