Python uses multiprocessing to implement multiprocessing

introduction

Hello everyone, when our work involves processing large amounts of data, parallel computing or concurrent tasks, Python's multiprocessing module is a powerful and practical tool. Through it, we can easily take advantage of the multi-core processor to assign tasks to multiple processes and execute them simultaneously, thereby improving program performance and efficiency. In this article, we will explore how to implement multiprocessing programming using the multiprocessing module. The concept and usage of process pool will be introduced, and how to use it to manage and schedule multiple processes. We will also discuss key issues such as the processing of concurrent tasks, inter-process communication and result acquisition, hoping to bring some help to everyone's work.

1. Introduction

Python multi-process is a parallel programming model that allows multiple processes to be executed simultaneously in a Python program. Each process has its own independent memory space and execution environment, and can execute tasks in parallel, thereby improving program performance and efficiency.

advantage:

Parallel processing: Multiple processes can execute multiple tasks at the same time, making full use of the capabilities of multi-core processors to achieve parallel processing. This can significantly improve the performance and efficiency of the program, especially in scenarios where intensive tasks are handled or require a lot of computation.
Independence: Each process has its own independent address space and execution environment, and processes do not interfere with each other. This means that each process can perform tasks independently and will not be affected by other processes. This independence makes multi-process programming more robust and reliable.
Memory isolation: Since each process has its own address space, the data between multiple processes is isolated from each other. This means that variables and data between different processes do not affect each other, reducing the complexity of data sharing and synchronization.
Failure isolation: If one process crashes or errors occur, it will not affect the execution of other processes. Each process is an independent entity. The failure of one process will not have a fatal impact on the entire program, improving the stability and fault tolerance of the program.
Portability: Multi-process programming can run on different operating systems because processes are the basic concepts provided by the operating system. This makes multi-process programming very portable and can be deployed and run on different platforms.

shortcoming:

Resource consumption: Each process requires independent memory space and system resources, including opened files, network connections, etc. Multi-process programming may increase the system's resource consumption, especially when creating a large number of processes.
Context switching overhead: In multi-process programming, switching between processes requires saving and restoring the process's execution environment, which involves the overhead of context switching. Frequent process switching may result in additional overhead, affecting the performance of the program.
Data Sharing and Synchronization: Since data between multiple processes are isolated from each other, data sharing and synchronization need to be performed through specific mechanisms. This may involve the complexity of inter-process communication (IPC), such as queues, pipelines, shared memory, etc. Correct handling of data sharing and synchronization is one of the challenges in multi-process programming.
Programming complexity: Multi-process programming may be more complex than single-threaded or multi-threaded programming. It is necessary to consider issues such as process creation and management, inter-process communication, data sharing and synchronization. Writing and debugging multi-process programs may require more work and experience.

Processes and threads:

Before discussing multi-process, it is necessary to clarify the concepts of processes and threads.
A process is an instance of a program that is running on the computer. Each process has its own address space, data stack and control information, and can perform tasks independently.
A thread is an execution unit in a process and can be regarded as a lightweight process. Multiple threads share resources of the same process, including memory space, file descriptors, etc.

Multi-process programming has obvious advantages in parallel processing and resource isolation, but it also involves issues such as resource consumption, context switching overhead, data sharing and synchronization. In practical applications, developers should weigh the pros and cons and choose suitable programming models and tools based on specific scenarios.

2. Create a process

In Python, you can usemultiprocessingModules to create and manage processes. This module provides a wealth of classes and functions for creating, starting and managing processes.

1. Import the multiprocessing module

In usemultiprocessingBefore the module, you need to import it first:

import multiprocessing

2. Create a process

Availableclass to create process objects. An objective function needs to be passed as the execution logic of the process. Can be inheritedClass customizes process classes.

import multiprocessing
 
def worker():
    # The logic of process execution 
if __name__ == '__main__':
    process = (target=worker)

In the example above,workerFunctions are the execution logic of a process. After the process object is created, the process can be configured by setting parameters, calling methods, etc.

3. Start the process

By calling the process objectstart()Method, start the process. The process will start executing in the background.

()

4. Process status

Process objects provide some methods to obtain and manage the status of a process:

is_alive(): Check whether the process is running.
join([timeout]): Wait for the process to end. Optional parameterstimeoutSpecify the maximum time to wait.

if process.is_alive():
    print("Process is running")
 
()

2. Inter-process communication

Inter-Process Communication (IPC) refers to the mechanism for data exchange and sharing information between different processes. In multi-process programming, data transmission, sharing state, or synchronization operations are usually required between processes. Python provides a variety of mechanisms for inter-process communication, including queues (Queue), pipelines (Pipe), shared memory (Value, Array), etc.

1. Queue

Queue is a commonly used inter-process communication method, through which data transmission between processes can be realized. PythonmultiprocessingModule providedQueueClass to implement queue communication between multiple processes. The process can passput()Methods put data into queues, and other processes can passget()Methods get data from the queue.

from multiprocessing import Queue
 
# Create a queuequeue = Queue()
 
# Process 1 puts data(data)
 
# Process 2 gets datadata = ()

2. Pipe

Pipeline is another commonly used method of inter-process communication, through which two-way communication between processes can be realized. PythonmultiprocessingModule providedPipeclass to create pipeline objects.Pipe()The method returns two connected pipe ends, one for sending data and the other for receiving data.

from multiprocessing import Pipe
 
# Create a pipelineconn1, conn2 = Pipe()
 
# Process 1 sends data(data)
 
# Process 2 receives datadata = ()

3. Shared memory (Value, Array)

Shared memory is an efficient way to share data between multiple processes. PythonmultiprocessingModule providedValueandArrayClass to implement sharing of data between processes.ValueUsed to share a single value, andArrayUsed to share arrays.

from multiprocessing import Value, Array
 
# Create a shared valueshared_value = Value('i', 0)
 
# Create a shared arrayshared_array = Array('i', [1, 2, 3, 4, 5])

When creating shared values and shared arrays, you need to specify the data type (such as integers, floating-point numbers) and the initial value. Processes can share data between processes by reading and writing shared values and sharing arrays.

4. Semaphore

Semaphores are mechanisms used to control access to shared resources. In multi-process programming, semaphores can be used to limit the number of processes that access a shared resource simultaneously.

from multiprocessing import Semaphore, Process
import time
 
def worker(semaphore, name):
    ()
    print("Worker", name, "acquired semaphore")
    (2)
    print("Worker", name, "released semaphore")
    ()
 
semaphore = Semaphore(2)
 
processes = []
for i in range(5):
    p = Process(target=worker, args=(semaphore, i))
    (p)
    ()
 
for p in processes:
    ()

In the above example, a semaphore is created with an initial value of 2. Then 5 processes are created, each process will try to obtain the semaphore before execution. If the value of the semaphore is greater than 0, it will be successfully obtained; otherwise, the process will be blocked until a process releases the semaphore. After each process acquires the semaphore, a task will be performed and the semaphore will be released after execution.

5. Event (Event)

An event is a synchronization mechanism for multi-process communication, which allows one or more processes to wait for an event to occur before continuing to execute.

from multiprocessing import Event, Process
import time
 
def worker(event, name):
    print("Worker", name, "waiting for event")
    ()
    print("Worker", name, "received event")
    (2)
    print("Worker", name, "completed task")
 
event = Event()
 
processes = []
for i in range(3):
    p = Process(target=worker, args=(event, i))
    (p)
    ()
 
(3)
()
 
for p in processes:
    ()

In the above example, an event is created. Then 3 processes are created, each process will wait for the event to occur before execution, that is, call the () method. After the main process sleeps for 3 seconds, the status of the event is set to have occurred, that is, the() method is called. At this time, all processes waiting for events will be awakened and the task will continue.

6. Condition variables

Conditional variables are a mechanism for coordination and synchronization between multiple processes, which can be used to control the order of execution between multiple processes.

from multiprocessing import Condition, Process
import time
 
def consumer(condition):
    with condition:
        print("Consumer is waiting")
        ()
        print("Consumer is consuming the product")
 
def producer(condition):
    with condition:
        (2)
        print("Producer is producing the product")
        ()
 
condition = Condition()
 
consumer_process = Process(target=consumer, args=(condition,))
producer_process = Process(target=producer, args=(condition,))
 
consumer_process.start()
producer_process.start()
 
consumer_process.join()
producer_process.join()

In the above example, a condition variable is created. Then a consumer process and a producer process are created. The consumer process waits for the condition to be satisfied before execution, that is, calls the() method. After the producer process sleeps for 2 seconds, the product is generated and the consumer is notified through the () method. The consumer continues to perform tasks after receiving the notification.

3. Synchronization between processes

Inter-process synchronization is a mechanism that ensures that multiple processes execute in a specific order or have mutually exclusive access on shared resources. The purpose of inter-process synchronization is to avoid the problem of race condition and data inconsistency. Python provides a variety of mechanisms to realize synchronization between processes, including locks, semaphores, events, condition variables, etc.

1. Lock

Lock is a basic synchronization mechanism used to protect mutually exclusive access to shared resources and ensure that only one process can access shared resources at any time. In Python, you can usemultiprocessingModularLockclass to implement locks.

from multiprocessing import Lock, Process
 
lock = Lock()
 
def worker(lock, data):
    ()
    try:
        # Operate on shared resources        pass
    finally:
        ()
 
processes = []
for i in range(5):
    p = Process(target=worker, args=(lock, i))
    (p)
    ()
 
for p in processes:
    ()

In the above example, each process acquires the lock before accessing the shared resource, and then releases the lock after completing the operation. This ensures that at the same time only one process can access shared resources and avoid data competition issues.

2. Semaphore

Semaphore is a more flexible synchronization mechanism that allows multiple processes to access a resource at the same time, but limits the number of processes accessed simultaneously. In Python, you can usemultiprocessingModularSemaphoreclass to implement semaphores.

from multiprocessing import Semaphore, Process
 
semaphore = Semaphore(2)
 
def worker(semaphore, data):
    ()
    try:
        # Operate on shared resources        pass
    finally:
        ()
 
processes = []
for i in range(5):
    p = Process(target=worker, args=(semaphore, i))
    (p)
    ()
 
for p in processes:
    ()

In the above example, a semaphore with an initial value of 2 is created. Each process will try to obtain the semaphore before accessing the shared resource. It can only be successful if the value of the semaphore is greater than 0, otherwise the process will be blocked. After the acquisition is successful, the process can operate and release the semaphore after completion.

3. Event (Event)

Events are a synchronization mechanism used to implement waiting and notification mechanisms between processes. One process can wait for the event to occur, while another process can trigger the event to occur. In Python, you can usemultiprocessingModularEventclass to implement events.

from multiprocessing import Event, Process
 
event = Event()
 
def worker(event, data):
    ()
    # Execute tasks 
processes = []
for i in range(5):
    p = Process(target=worker, args=(event, i))
    (p)
    ()
 
# Trigger event occurrence()
 
for p in processes:
    ()

In the above example, multiple processes will wait for an event to occur before executing a task, i.e., call()method. The main process is called()Methods to trigger the occurrence of an event and then wake up the waiting process to continue execution.

4. Condition variables

Conditional variables are a complex synchronization mechanism that allows processes to wait and notify according to specific conditions. In Python, you can usemultiprocessingModularConditionclass to implement conditional variables.

from multiprocessing import Condition, Process
 
condition = Condition()
 
def consumer(condition（Continued）：
def consumer(condition, data):
    with condition:
        while True:
            # Check whether the conditions are met            while not condition_is_met():
                ()
            # Consume data from shared resourcesdef producer(condition, data):
    with condition:
        # Generate data and update shared resources        condition.notify_all()
processes = []
for i in range(5):
    p = Process(target=consumer, args=(condition, i))
    (p)
    ()
producer_process = Process(target=producer, args=(condition, data))
producer_process.start()
for p in processes:
    ()
producer_process.join()

In the above example, the consumer process will check whether the condition is satisfied before executing the task. If the condition is not met, the() method will be called to wait for the condition to be satisfied. After the producer process generates data and updates the shared resources, the condition.notify_all() method is called to notify all waiting consumer processes that have been met. The awakened consumer process rechecks the conditions and performs tasks.

4. Process pool

Process pooling is a mechanism for managing and scheduling multiple processes, which can effectively handle parallel tasks and improve program performance. Process pools are usually used in PythonmultiprocessingModule providedPoolclass to implement.

The working principle of a process pool is as follows:

When creating a process pool, a specified number of processes is started and placed into the pool.
Processes in the pool will wait for the main process to submit tasks.
The main process assigns tasks to the process pool by submitting tasks to the process.
The process in the process pool executes the task and returns the result to the main process.
The main process obtains the result of the task and continues to perform other operations.
When all tasks are completed, the main process closes the process pool.

1. Create a process pool

To use a process pool, you need to create aPoolObject, which specifies the number of processes in the pool. Usually, it can be usedmultiprocessing.cpu_count()Functions to get the number of CPU cores of the current system, and then specify the size of the process pool as needed.

from multiprocessing import Pool, cpu_count
 
pool = Pool(processes=cpu_count())

In the above example, a process pool is created with the same number of processes as the number of CPU cores of the system.

2. Submit a task

Once the process pool is created, it can be usedapply()、map()orimap()Method to submit tasks to process pool.

The apply() method is used to submit a single task and wait for the task to complete and return the result.

result = (function, args=(arg1, arg2))

The map() method is used to submit multiple tasks and return the result list in the order in which the task is submitted.

results = (function, iterable)

The imap() method is also used to submit multiple tasks, but the results can be obtained one by one through the iterator without waiting for all tasks to complete.

results = (function, iterable)

In the above example,functionIndicates the function to be executed,argsis a parameter of a function,iterableIt is an iterable object, which can be a list, a tuple, etc.

3. Obtain the results

forapply()Method, after being called, will block the main process until the task completes and returns the result. formap()Method, after being called, it will wait for all tasks to complete, and return the result list in the order in which the task is submitted. forimap()Methods, you can get the results one by one through the iterator.

for result in results:
    print(result)

In the above example, useforLoop the results one by one and process them.

4. Close the process pool

After all tasks are completed, the process pool needs to be explicitly closed to free up resources.

()
()

Callclose()After the method, the process pool will no longer accept new tasks. Calljoin()The method blocks the main process until all tasks have been completed.

5. Example of using process pool

from multiprocessing import Pool
 
# Define a task functiondef square(x):
    return x ** 2
 
if __name__ == '__main__':
    # Create a process pool    with Pool(processes=4) as pool:
        # Submit tasks to process pool        results = (square, range(10))
 
    # Print the results    print(results)

In the above example, a task function square is first defined, which takes a value as a parameter and returns the square of the value.

In if __name__ == '__main__':, a process pool is created with the specified number of processes being 4. Using the with statement ensures that the process pool is closed correctly after use.

Then, submit the task to the process pool by (square, range(10)). The map() method takes the task function square and an iterable object range(10) as parameters. It passes each element in the iterable object to the task function for processing and returns a list of results. Finally, print the result list, i.e. the square of each value.

It should be noted that when using process pool, you need to place the main program code in if __name__ == '__main__': to ensure that the main program's code is not repeated in the child process.

Here is a more complex multi-process example that shows how to use a process pool to process multiple tasks and get results when the task is completed.

import time
from multiprocessing import Pool
 
# Define a task functiondef process_data(data):
    # Simulation time-consuming operation    (1)
    # Return processing results    return ()
 
if __name__ == '__main__':
    # Create a process pool    with Pool(processes=3) as pool:
        # Prepare data        data_list = ['apple', 'banana', 'cherry', 'date', 'elderberry']
 
        # Submit tasks to process pool        results = [pool.apply_async(process_data, args=(data,)) for data in data_list]
 
        # Wait for all tasks to complete and get the results        final_results = [() for result in results]
 
    # Print the results    for result in final_results:
        print(result)

In the above example, in addition to using the map() method of the process pool to submit tasks, the apply_async() method is also used to submit tasks asynchronously, and the result of the task is obtained through the get() method.

In if __name__ == '__main__':, a process pool is created, with the specified number of processes being 3. Using the with statement ensures that the process pool is closed correctly after use. Then, a data list data_list is prepared, which contains the data to be processed.

Through list comprehension, use pool.apply_async(process_data, args=(data,))) to submit tasks asynchronously to the process pool. The apply_async() method will use the task function process_data and data as parameters and return an AsyncResult object to represent the result of the asynchronous task. Store these objects in the results list.

Next, use the list comprehension, wait for all tasks to complete and get the results through the() method, and store the results in the final_results list. Finally, use a for loop to iterate through the final_results list and print the processing results for each task.

The advantage of process pool is that it can automatically manage and schedule multiple processes, make full use of system resources, and improve the parallel execution capabilities of programs. By reasonably setting the size of the process pool, the optimal concurrency effect can be achieved without excessive consumption of system resources. But it should be noted that process pools are suitable for tasks that require parallel execution, but not for IO-intensive tasks, because processes in the process pool are created by copying the main process, while IO-intensive tasks are more suitable for using thread pools to achieve concurrency.

The above is the detailed content of Python's multiprocessing implementation of multiprocessing. For more information about Python multiprocessing, please pay attention to my other related articles!