SoFunction
Updated on 2025-04-16

Detailed explanation of how Python scripts continue to run in the background

1. Panoramic analysis of production environment demand

1.1 Industrial-grade requirements matrix for background processes

Dimension Development environment requirements Production environment requirements Disaster recovery requirements
reliability Single point operation Cluster deployment Cross-computer room disaster recovery
Observability Console output Centralized logging Distributed tracking
Resource Management Unlimited CPU/Memory Limitations Dynamic resource scheduling
Lifecycle Management Manual start and stop Automatically pull up Rolling Upgrade
Security Normal permissions Minimum permission principle Safety sandbox

1.2 Analysis of typical application scenarios

IoT data collection: 7x24 hours of operation, disconnection and reconnection, resource-constrained environment

Financial trading system: submillisecond delay, zero-tolerance process interruption

AI training tasks: GPU resource management, long-term operation guarantee

Web Services: High concurrency processing, elegant start-stop mechanism

2. Advanced process management plan

2.1 Professional management with Supervisor

Architecture principle:

+---------------------+
|   Supervisor Daemon |
+----------+----------+
           |
| Manage subprocess
+----------v----------+
|   Managed Process   |
|  (Python Script)    |
+---------------------+

Configuration example (/etc/supervisor//):

[program:webapi]
command=/opt/venv/bin/python /app/
directory=/app
user=appuser
autostart=true
autorestart=true
startsecs=3
startretries=5

stdout_logfile=/var/log/
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=10

stderr_logfile=/var/log/
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=10

environment=PYTHONPATH="/app",PRODUCTION="1"

Core functions:

  • Automatic restart after abnormal process exit
  • Log rotation management
  • Resource usage monitoring
  • Web UI Management Interface
  • Event notification (email/Slack)

2.2 Kubernetes containerized deployment

Deployment configuration example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: data-processor
spec:
  replicas: 3
  selector:
    matchLabels:
      app: data-processor
  template:
    metadata:
      labels:
        app: data-processor
    spec:
      containers:
      - name: main
        image: /data-processor:v1.2.3
        resources:
          limits:
            cpu: "2"
            memory: 4Gi
          requests:
            cpu: "1"
            memory: 2Gi
        livenessProbe:
          exec:
            command: ["python", "/app/"]
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
        volumeMounts:
          - name: config-volume
            mountPath: /app/config
      volumes:
        - name: config-volume
          configMap:
            name: app-config

Key Benefits:

  • Automatic horizontal expansion
  • Rolling update strategy
  • Self-healing mechanism
  • Resource isolation guarantee
  • Cross-node scheduling capability

3. High availability architecture design

3.1 Multi-live architecture implementation

#Distributed lock example (Redis implementation)import redis
from  import Lock

class HAWorker:
    def __init__(self):
         = (host='redis-cluster', port=6379)
        self.lock_name = "task:processor:lock"
        
    def run(self):
        while True:
            with Lock(, self.lock_name, timeout=30, blocking_timeout=5):
                self.process_data()
                
            (1)
            
    def process_data(self):
        # Core business logic        pass

3.2 Heartbeat detection mechanism

# Prometheus-based survival detectionfrom prometheus_client import start_http_server, Gauge

class HeartbeatMonitor:
    def __init__(self, port=9000):
         = Gauge('app_heartbeat', 'Last successful heartbeat')
        start_http_server(port)
        
    def update(self):
        .set_to_current_time()

# Integrate in business codemonitor = HeartbeatMonitor()
while True:
    process_data()
    ()
    (60)

4. Advanced operation and maintenance skills

4.1 Comparison of log management plans

plan Collection method Query performance Storage cost Applicable scenarios
ELK Stack Logstash high high Big data analysis
Loki+Promtail Promtail middle Low Kubernetes environment
Splunk Universal FW Extremely high Extremely high Enterprise-level security audit
Graylog Syslog middle middle Medium-sized enterprises

4.2 Performance optimization indicator monitoring

# Use psutil for resource monitoringimport psutil

def monitor_resources():
    return {
        "cpu_percent": psutil.cpu_percent(interval=1),
        "memory_used": psutil.virtual_memory().used / 1024**3,
        "disk_io": psutil.disk_io_counters().read_bytes,
        "network_io": psutil.net_io_counters().bytes_sent
    }

# Integrate into Prometheus exporterfrom prometheus_client import Gauge

cpu_gauge = Gauge('app_cpu_usage', 'CPU usage percentage')
mem_gauge = Gauge('app_memory_usage', 'Memory usage in GB')

def update_metrics():
    metrics = monitor_resources()
    cpu_gauge.set(metrics['cpu_percent'])
    mem_gauge.set(metrics['memory_used'])

5. Safety reinforcement practice

5.1 Implementation of the principle of minimum authority

# Create a dedicated usersudo useradd -r -s /bin/false appuser

# Set file permissionssudo chown -R appuser:appgroup /opt/app
sudo chmod 750 /opt/app

# Use capabilities instead of rootsudo setcap CAP_NET_BIND_SERVICE=+eip /opt/venv/bin/python

5.2 Security Sandbox Configuration

# Use seccomp to restrict system callsimport prctl

def enable_sandbox():
    # Forbid new fork processes    prctl.set_child_subreaper(1)
    prctl.set_no_new_privs(1)
    
    # Limit dangerous system calls    from seccomp import SyscallFilter, ALLOW, KILL
    filter = SyscallFilter(defaction=KILL)
    filter.add_rule(ALLOW, "read")
    filter.add_rule(ALLOW, "write")
    filter.add_rule(ALLOW, "poll")
    ()

6. Disaster recovery and recovery strategies

6.1 State persistence scheme

# Checkpoint-based state recoveryimport pickle
from datetime import datetime

class StateManager:
    def __init__(self):
        self.state_file = "/var/run/app_state.pkl"
        
    def save_state(self, data):
        with open(self.state_file, 'wb') as f:
            ({
                'timestamp': (),
                'data': data
            }, f)
            
    def load_state(self):
        try:
            with open(self.state_file, 'rb') as f:
                return (f)
        except FileNotFoundError:
            return None

# Integrate in business logicstate_mgr = StateManager()
last_state = state_mgr.load_state()

while True:
    process_data(last_state)
    state_mgr.save_state(current_state)
    (60)

6.2 Cross-regional disaster recovery deployment

# AWS Multi-region Deployment Exampleresource "aws_instance" "app_east" {
  provider = -east-1
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = ""
  count         = 3
}

resource "aws_instance" "app_west" {
  provider = -west-2
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = ""
  count         = 2
}

resource "aws_route53_record" "app" {
  zone_id = var.dns_zone
  name    = ""
  type    = "CNAME"
  ttl     = "300"
  records = [
    aws_lb.app_east.dns_name,
    aws_lb.app_west.dns_name
  ]
}

7. Performance tuning practice

7.1 Memory optimization tips

# Use __slots__ to reduce memory usageclass DataPoint:
    __slots__ = ['timestamp', 'value', 'quality']
    
    def __init__(self, ts, val, q):
         = ts
         = val
         = q

# Use memory_profiler to analyze@profile
def process_data():
    data = [DataPoint(i, i*0.5, 1) for i in range(1000000)]
    return sum( for d in data)

7.2 CPU-intensive task optimization

# Use Cython to speed up# File: 
cimport cython

@(False)
@(False)
def calculate(double[:] array):
    cdef double total = 0.0
    cdef int i
    for i in range([0]):
        total += array[i] ** 2
    return total

# Use multiprocessing to parallelizefrom multiprocessing import Pool

def parallel_process(data_chunks):
    with Pool(processes=8) as pool:
        results = (process_chunk, data_chunks)
    return sum(results)

8. Future evolution direction

8.1 Serverless architecture transformation

# AWS Lambda function exampleimport boto3

def lambda_handler(event, context):
    s3 = ('s3')
    
    # Handle S3 events    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key']
        
        # Execute processing logic        process_file(bucket, key)
        
    return {
        'statusCode': 200,
        'body': 'Processing completed'
    }

8.2 Construction of intelligent operation and maintenance system

# Machine learning exception detectionfrom  import IsolationForest

class AnomalyDetector:
    def __init__(self):
         = IsolationForest(contamination=0.01)
        
    def train(self, metrics_data):
        (metrics_data)
        
    def predict(self, current_metrics):
        return ([current_metrics])[0]

# Integrate into the monitoring systemdetector = AnomalyDetector()
(historical_metrics)

current = collect_metrics()
if (current) == -1:
    trigger_alert()

9. Summary of industry best practices

Financial industry: adopting dual active architecture, RTO <30 seconds, RPO=0

E-commerce system: elastic expansion and capacity design to deal with traffic peaks

IoT Platform: Edge Computing + Cloud Collaboration Architecture

AI platform: GPU resource sharing scheduling, preemptive task management

This is the end of this article about the detailed explanation of how Python scripts continue to run in the background. For more related content on the background of Python scripts, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!