Detailed explanation of the basic usage of multi-threading and multi-process in Python

introduction

In Python programming, we often need to handle multiple tasks, such as batch downloading of files, crawling web page data, performing large-scale calculations, etc. If executed in a traditional order, the efficiency is often not satisfactory. Fortunately, Python provides two concurrent programming methods: multi-threading and multi-processing, which can help us significantly improve the execution efficiency of our programs. This article will introduce in detail the basic usage of multi-threading and multi-process in Python, and demonstrate its application through actual cases and code, allowing you to easily master concurrent programming skills.

1. The main advantages of concurrent programming

Before we explain in depth, let’s first understand the main advantages of concurrent programming:

Improve program execution speed: Multiple tasks can be run simultaneously, reducing waiting time.
Improve CPU and I/O resource utilization: Multi-processes can make full use of multi-core CPUs, and multi-threading can optimize I/O tasks.
Improve the responsiveness of the program: suitable for GUI programs, crawlers, file processing and other scenarios.

2. Python multi-threading (Threading)

1. What is multithreading?

Threading allows programs to run multiple threads simultaneously in the same process, each thread can perform independent tasks. Multithreading is especially suitable for I/O-intensive tasks, such as network requests, file reading and writing, etc. Python provides threading modules that can easily implement multi-threading programming.

2. Multithreaded example

Suppose we have a task that requires 10 files to be downloaded, and each file has a download time of about 5 seconds. If executed in sequence, it takes 50 seconds in total to complete all download tasks. And if we use multiple threads to execute multiple tasks at the same time, we can greatly improve execution efficiency.

Here is a simple multithreaded sample code:

import threading
import time
 
def download_file(file_name):
    print(f"Start downloading {file_name}...")
    (5)  # Simulation download time    print(f"{file_name} Download completed!")
 
files = ["", "", ""]
threads = []
 
for file in files:
    thread = (target=download_file, args=(file,))
    (thread)
    ()
 
for thread in threads:
    ()
 
print("All files are downloaded!")

Code parsing:

(target=download_file, args=(file,)): Create a thread, and each thread executes the download_file() function.
(): Start the thread.
(): Wait for the thread to complete execution and ensure that all tasks are completed before continuing to execute the main program.

3. Applicable scenarios for multithreading

Multithreading is suitable for I/O-intensive tasks, such as crawling web page data, processing file reading and writing, etc. However, due to Python's global interpreter lock (GIL) limitations, multithreading cannot truly implement parallelism in CPU-intensive tasks (such as mathematical calculations, image processing), but pseudo-parallelism. Therefore, for CPU-intensive tasks, it is recommended to use multi-processes.

3. Multiprocessing in Python

1. What is multi-process?

Multiprocessing allows programs to run multiple processes at the same time, each process has independent memory space, so you can make full use of multi-core CPUs for real parallel computing. Multi-process is suitable for CPU-intensive tasks, such as scientific computing, data processing, image processing, etc. Python provides multiprocessing module to create multiprocessing.

2. Multi-process example

Here is a simple multi-process sample code for calculating the square of multiple numbers:

import multiprocessing
import time
 
def compute_square(n):
    print(f"calculate {n} Square of...")
    (2)  # Simulation calculation time    print(f"{n} Square of是 {n**2}")
 
numbers = [2, 4, 6, 8]
processes = []
 
for num in numbers:
    process = (target=compute_square, args=(num,))
    (process)
    ()
 
for process in processes:
    ()
 
print("All calculations are completed!")

Code parsing:

(target=compute_square, args=(num,)): Create a process, and each process executes the compute_square() function.
(): Start the process.
(): Wait for the process to complete and ensure that all tasks are completed before continuing to execute the main program.

3. Applicable scenarios and limitations of multi-process

Multi-processes are suitable for CPU-intensive tasks, such as complex mathematical calculations, image processing, big data analysis, etc. However, multi-processes also have some limitations:

The overhead of process creation and management is greater than that of threads.
Data sharing between processes is complicated and requires the use of Queue or Manager.

4. ThreadPoolExecutor & ProcessPoolExecutor

Manually creating and managing a large number of threads or processes can become very cumbersome when a large number of tasks are required to be performed. For convenience, Python provides the functionality of thread pools and process pools.

1. Thread pool example

Here is a sample code to use a thread pool to download multiple URL content:

from  import ThreadPoolExecutor
import time
import requests
 
def download_url(url):
    response = (url)
    return 
 
urls = ['', '', '']
 
with ThreadPoolExecutor(max_workers=3) as executor:
    results = list((download_url, urls))
 
print("Download Completed")

In this example, we use ThreadPoolExecutor to download the content of multiple URLs simultaneously, leveraging thread pools to reduce the overhead of creating threads and improve download speed.

2. Process pool example

Here is an example code that uses a process pool to calculate the square of a large number of numeric values:

from  import ProcessPoolExecutor
 
def square_number(n):
    return n * n
 
numbers = list(range(1000000))
 
with ProcessPoolExecutor(max_workers=4) as executor:
    results = list((square_number, numbers))
 
print("Computation Completed", list(results)[:10])  # Print the first 10 results to indicate

In this example, ProcessPoolExecutor creates multiple processes that calculate a square of one million numbers in parallel, significantly improving the computing speed.

5. Choose the right concurrency method

When choosing to use multi-threading or multi-processing, the following factors should be considered:

Task type: I/O-intensive tasks are more suitable for multi-threading, and CPU-intensive tasks are more suitable for multi-processing.
Resource consumption: The resource consumption of threads is smaller than that of processes, but due to the existence of GIL, multithreading is inefficient in CPU-intensive tasks.
Code complexity: Multi-process code is usually more complex than multi-threading, but it can effectively avoid the impact of GIL.

In practical applications, it may be necessary to handle both I/O-intensive and CPU-intensive tasks. For example, in a web crawler application, you can use multi-threading to download web content and parse and process these contents using multi-processes. This can make full use of system resources and improve overall performance.

Here is a comprehensive example showing how to use multithreading to download data and process it using multiprocessing:

import requests
from  import ThreadPoolExecutor, ProcessPoolExecutor
 
def download_url(url):
    response = (url)
    return 
 
def extract_text(html):
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(html, '')
    return soup.get_text()
 
def count_words(text):
    return len(())
 
urls = ['', '', '']
 
# Use multi-threading to download datawith ThreadPoolExecutor(max_workers=3) as executor:
    html_contents = list((download_url, urls))
 
# Use multi-process processing datawith ProcessPoolExecutor(max_workers=4) as executor:
    texts = list((extract_text, html_contents))
    word_counts = list((count_words, texts))
 
print("Web page download and data processing completed")
print("Word Statistics:", word_counts)

In this example, we first use multithreading to download web content, then use multiprocessing to extract text and count word counts, thus maximizing performance. This combination of multi-threading and multi-processing is very useful when dealing with typical scenarios such as web crawlers and data processing.

6. Summary

Multithreading and multiprocessing are important tools in Python to improve program execution efficiency. Multithreading is suitable for I/O-intensive tasks, while multi-processing is suitable for CPU-intensive tasks. The complexity of concurrent programming can be further simplified by the rational use of thread pools and process pools. When choosing a concurrency method, it should be comprehensively considered based on factors such as task type, resource consumption and code complexity. I hope this article can help you better understand and apply multi-threading and multi-process technologies in Python, so that your programs can run faster and more efficiently!

The above is a detailed explanation of the basic usage of multi-threading and multi-process in Python. For more information about Python multi-threading and multi-process, please pay attention to my other related articles!