SoFunction
Updated on 2024-12-20

Multi-processing and process pooling in python (Processing library)

Environment:win7+python2.7

have always wanted to learn multi-process or multi-threaded, but before simply look at the basics and a simple introduction, can not understand how to apply, until some time ago to see a github crawler project involves multi-process, multi-threaded content, while looking at Baidu related knowledge, and now some of the related knowledge and some applications written down to make a record.

First of all, what is a process: a process is a program on the computer to execute an activity, when running a program, a process is started. Processes are divided into system processes and user processes. As long as the process is used to complete the various functions of the operating system is the system process, they are in the running state of the operating system itself; and all the processes started by you are user processes. Processes are the unit of resource allocation of the operating system.

straight point of view, in the task manager user name labeled system is the system process, labeled administrator is the user process, in addition net is the net Luo, lcacal service is a local service, about the process of more specific information can be wikipedia, here have to save a little effort, or can not be recovered.

I. Simple use of multiprocessing

As shown, multiprocessing has a number of functions, many of which I have not been to understand, here only speak of my current understanding.

Process Creation:Process(target=main function to run, name=custom process name can be left out, args=(arguments))

Methods:

  1. is_alive(): determine whether the process is alive or not
  2. join([timeout]): the end of the child process and then execute the next step, timeout for the timeout, sometimes the process encounters blocking, in order to run the program can run on and set the timeout time
  3. run(): if you don't specify a target when creating a Process object, then the run method of the Process will be executed by default.
  4. start(): start the process, as opposed to run().
  5. terminate(): terminate the process, on the termination of the process is not so simple, seems to use the psutil package will be better, have the opportunity to learn more later to write.

Where Process starts a process with start().

Properties:

  1. authkey: in the document authkey () function to find this sentence: Set authorization key of process set process authorization key , currently do not find relevant examples of applications, this key is how to use it? The article does not mention
  2. daemon: terminate automatically after the parent process is terminated, and can't spawn new process by itself, must be set before start().
  3. exitcode: the process is None at runtime, if -N, it means it is ended by signal N.
  4. name: name of the process, customizable.
  5. pid:Each process has a unique PID number.

(),start(),join()

# -*- coding:utf-8 -*-
from multiprocessing import Process
import time

def fun1(t):
 print 'this is fun1',()
 (t)
 print 'fun1 finish',()

def fun2(t):
 print 'this is fun2',()
 (t)
 print 'fun2 finish',()

if __name__ == '__main__':
 a=()
 p1=Process(target=fun1,args=(4,))
 p2 = Process(target=fun2, args=(6,))
 ()
 ()
 ()
 ()
 b=()
 print 'finish',b-a

Here a total of two processes, p1 and p2, arg = (4,) in the 4 is the fun1 function parameters, here to use the tulpe type, if two parameters or more is arg = (parameter 1, parameter 2 ...) , and then use start() to start the process, we set up to wait for the end of the p1 and p2 process before executing the next step. Look at the following results, fun2 and fun1 basically start running at the same time, when the run is complete (fun1 sleeps for 4 seconds, while fun2 sleeps for 6 seconds), before the execution of print 'finish', b-a statement

this is fun2 Mon Jun 05 13:48:04 2017
this is fun1 Mon Jun 05 13:48:04 2017
fun1 finish Mon Jun 05 13:48:08 2017
fun2 finish Mon Jun 05 13:48:10 2017
finish 6.20300006866

Process finished with exit code 0

Let's take a look at what happens when start() and join() are in different positions

# -*- coding:utf-8 -*-
from multiprocessing import Process
import time

def fun1(t):
 print 'this is fun1',()
 (t)
 print 'fun1 finish',()

def fun2(t):
 print 'this is fun2',()
 (t)
 print 'fun2 finish',()

if __name__ == '__main__':
 a=()
 p1=Process(target=fun1,args=(4,))
 p2 = Process(target=fun2, args=(6,))
 ()
 ()
 ()
 ()
 b=()
 print 'finish',b-a

Results.

this is fun1 Mon Jun 05 14:19:28 2017
fun1 finish Mon Jun 05 14:19:32 2017
this is fun2 Mon Jun 05 14:19:32 2017
fun2 finish Mon Jun 05 14:19:38 2017
finish 10.1229999065

Process finished with exit code 0

Look, now is the first run fun1 function, run and then run fun2 and then print 'finish', that is, first run process p1 and then run process p2, feel the charm of join () it. Now try to comment out join() and see what happens.

# -*- coding:utf-8 -*-
from multiprocessing import Process
import time

def fun1(t):
 print 'this is fun1',()
 (t)
 print 'fun1 finish',()

def fun2(t):
 print 'this is fun2',()
 (t)
 print 'fun2 finish',()

if __name__ == '__main__':
 a=()
 p1=Process(target=fun1,args=(4,))
 p2 = Process(target=fun2, args=(6,))
 ()
 ()
 ()
 #()
 b=()
 print 'finish',b-a

Results.

this is fun1 Mon Jun 05 14:23:57 2017
this is fun2 Mon Jun 05 14:23:58 2017
fun1 finish Mon Jun 05 14:24:01 2017
finish 4.05900001526
fun2 finish Mon Jun 05 14:24:04 2017

Process finished with exit code 0

This time it runs fun1 (because p1 process uses join(), so the main program waits for p1 to finish running and then executes the next step), then continues to run the print 'finish' of the main process, and finally fun2 finishes running before it ends.

,daemon,is_alive():

# -*- coding:utf-8 -*-
from multiprocessing import Process
import time

def fun1(t):
 print 'this is fun1',()
 (t)
 print 'fun1 finish',()

def fun2(t):
 print 'this is fun2',()
 (t)
 print 'fun2 finish',()

if __name__ == '__main__':
 a=()
 p1=Process(name='fun1 process',target=fun1,args=(4,))
 p2 = Process(name='fun2 process',target=fun2, args=(6,))
 =True
  = True
 ()
 ()
 ()
 print p1,p2
 print 'Process 1:',p1.is_alive(),'Process 2:',p2.is_alive()
 #()
 b=()
 print 'finish',b-a

Results.

this is fun2 Mon Jun 05 14:43:49 2017
this is fun1 Mon Jun 05 14:43:49 2017
fun1 finish Mon Jun 05 14:43:53 2017
<Process(fun1step, stopped daemon)> <Process(fun2step, started daemon)>
step1: False step2: True
finish 4.06500005722

Process finished with exit code 0

As you can see, name is the name given to the process, when you run to print 'process1:',p1.is_alive(),'process2:',p2.is_alive()', p1 process is finished (return False), p2 process is still running (return True), but p2 didn't use join(), so it just goes on executing. the main process, due to the use of daemon = Ture, the parent process is terminated automatically after the termination of the p2 process is not the end of the process to force the end of the entire program.

()

run() in the Process does not specify the target function, the default run() function to run the program.

# -*- coding:utf-8 -*-
from multiprocessing import Process
import time

def fun1(t):
 print 'this is fun1',()
 (t)
 print 'fun1 finish',()

def fun2(t):
 print 'this is fun2',()
 (t)
 print 'fun2 finish',()

if __name__ == '__main__':
 a = ()
 p=Process()
 ()
 ()
 b = ()
 print 'finish', b - a

Results.

finish 0.0840001106262

As you can see from the result, process p does nothing, so in order to make the process work, we write.

The objective function has no arguments: the

# -*- coding:utf-8 -*-
from multiprocessing import Process
import time

def fun1():
 print 'this is fun1',()
 (2)
 print 'fun1 finish',()

def fun2(t):
 print 'this is fun2',()
 (t)
 print 'fun2 finish',()

if __name__ == '__main__':
 a = ()
 p=Process()
 =fun1
 ()
 ()
 b = ()
 print 'finish', b - a

Results.

this is fun1 Mon Jun 05 16:34:41 2017
fun1 finish Mon Jun 05 16:34:43 2017
finish 2.11500000954

Process finished with exit code 0

The objective function has parameters: the

# -*- coding:utf-8 -*-
from multiprocessing import Process
import time

def fun1(t):
 print 'this is fun1',()
 (t)
 print 'fun1 finish',()

def fun2(t):
 print 'this is fun2',()
 (t)
 print 'fun2 finish',()

if __name__ == '__main__':
 a = ()
 p=Process()
 =fun1(2)
 ()
 ()
 b = ()
 print 'finish', b - a

Results.

this is fun1 Mon Jun 05 16:36:27 2017
fun1 finish Mon Jun 05 16:36:29 2017
Process Process-1:
Traceback (most recent call last):
 File "E:\Anaconda2\lib\multiprocessing\", line 258, in _bootstrap
 ()
TypeError: 'NoneType' object is not callable
finish 2.0529999733

Process finished with exit code 0

There is an exception for the target function with parameters, why? I can't find the reason now, but in practice I found that when the last parameter is given to the process to run, there is no other parameter, there will be this exception, if anyone knows, please let me know.

II. Process pool

For the need to use a few or even a dozen processes, we use Process or more convenient, but if you want hundreds of thousands of processes, with Process is obviously too stupid, multiprocessing provides a Pool class, that is, now to talk about the process of pooling, to be able to put a number of processes together to set up a maximum number of processes to run, each time only run the process of the setup number of processes, such as the end of the process, and then add new processes

Pool(processes =num): set the number of running processes, when a process runs out, will add a new process into the

apply_async(function,(argument)): non-blocking, where the argument is of type tulpe.

apply(function,(argument)):blocking

close(): close the pool, no new tasks can be added.

terminate(): end the running process, no longer handle unfinished tasks.

join(): the same as introduced by Process, but to be used after close or terminate.

1. Individual process pools

# -*- coding:utf-8 -*-
from multiprocessing import Pool
import time

def fun1(t):
 print 'this is fun1',()
 (t)
 print 'fun1 finish',()

def fun2(t):
 print 'this is fun2',()
 (t)
 print 'fun2 finish',()

if __name__ == '__main__':
 a=()
 pool = Pool(processes =3) # Can run three processes at once
 for i in range(3,8):
  pool.apply_async(fun1,(i,))
 ()
 ()
 b=()
 print 'finish',b-a

Results.

this is fun1 Mon Jun 05 15:15:38 2017
this is fun1 Mon Jun 05 15:15:38 2017
this is fun1 Mon Jun 05 15:15:38 2017
fun1 finish Mon Jun 05 15:15:41 2017
this is fun1 Mon Jun 05 15:15:41 2017
fun1 finish Mon Jun 05 15:15:42 2017
this is fun1 Mon Jun 05 15:15:42 2017
fun1 finish Mon Jun 05 15:15:43 2017
fun1 finish Mon Jun 05 15:15:47 2017
fun1 finish Mon Jun 05 15:15:49 2017
finish 11.1370000839

Process finished with exit code 0

From the above results can be seen, set the upper limit of three running processes, 15:15:38 this time at the same time to start the three processes, when the first process is over (the parameter for the 3-second process), will add a new process, and so on until the pool of processes to run through the execution of the main process statement b = () print 'finish',b-a . Here we use non-blocking apply_async(), and compare it to blocking apply().

# -*- coding:utf-8 -*-
from multiprocessing import Pool
import time

def fun1(t):
 print 'this is fun1',()
 (t)
 print 'fun1 finish',()

def fun2(t):
 print 'this is fun2',()
 (t)
 print 'fun2 finish',()

if __name__ == '__main__':
 a=()
 pool = Pool(processes =3) # Can run three processes at once
 for i in range(3,8):
  (fun1,(i,))
 ()
 ()
 b=()
 print 'finish',b-a

Results.

this is fun1 Mon Jun 05 15:59:26 2017
fun1 finish Mon Jun 05 15:59:29 2017
this is fun1 Mon Jun 05 15:59:29 2017
fun1 finish Mon Jun 05 15:59:33 2017
this is fun1 Mon Jun 05 15:59:33 2017
fun1 finish Mon Jun 05 15:59:38 2017
this is fun1 Mon Jun 05 15:59:38 2017
fun1 finish Mon Jun 05 15:59:44 2017
this is fun1 Mon Jun 05 15:59:44 2017
fun1 finish Mon Jun 05 15:59:51 2017
finish 25.1610000134

Process finished with exit code 0

As you can see, blocking is when a process is finished, and then the next process, generally we use non-blocking apply_async()

2. Multiple process pools

The above is the use of a single process pool, for multiple process pools, we can use the for loop, directly see the code

# -*- coding:utf-8 -*-
from multiprocessing import Pool
import time

def fun1(t):
 print 'this is fun1',()
 (t)
 print 'fun1 finish',()

def fun2(t):
 print 'this is fun2',()
 (t)
 print 'fun2 finish',()

if __name__ == '__main__':
 a=()
 pool = Pool(processes =3) # Can run three processes at once
 for fun in [fun1,fun2]:
  for i in range(3,8):
   pool.apply_async(fun,(i,))
 ()
 ()
 b=()
 print 'finish',b-a

Results.

this is fun1 Mon Jun 05 16:04:38 2017
this is fun1 Mon Jun 05 16:04:38 2017
this is fun1 Mon Jun 05 16:04:38 2017
fun1 finish Mon Jun 05 16:04:41 2017
this is fun1 Mon Jun 05 16:04:41 2017
fun1 finish Mon Jun 05 16:04:42 2017
this is fun1 Mon Jun 05 16:04:42 2017
fun1 finish Mon Jun 05 16:04:43 2017
this is fun2 Mon Jun 05 16:04:43 2017
fun2 finish Mon Jun 05 16:04:46 2017
this is fun2 Mon Jun 05 16:04:46 2017
fun1 finish Mon Jun 05 16:04:47 2017
this is fun2 Mon Jun 05 16:04:47 2017
fun1 finish Mon Jun 05 16:04:49 2017
this is fun2 Mon Jun 05 16:04:49 2017
fun2 finish Mon Jun 05 16:04:50 2017
this is fun2 Mon Jun 05 16:04:50 2017
fun2 finish Mon Jun 05 16:04:52 2017
fun2 finish Mon Jun 05 16:04:55 2017
fun2 finish Mon Jun 05 16:04:57 2017
finish 19.1670000553

Process finished with exit code 0

See, run fun1 and then run fun2.

Also, in the case of no arguments, just pool.apply_async(funtion), without arguments.

In the process of learning to write programs, have encountered not to use if _name_ == '_main_':: and directly run the program, so that the results will be wrong, after querying, in order to use the process module on Windows, it is necessary to write the code about the process of the current .py file in the if _name_ == '_main_' :statement below, in order to properly use the process module. To use the process module on Windows, you must write the process code below the if _name_ == '_main_' :statement in the current .py file, which is required for the process module to work properly on Windows. The reason for this is stated as follows: during execution, the py you write will be read in as a module. So, you must determine if you are _main_ or not. That is to say:

if __name__ == ‘__main__' :
# do something.

I don't know what I'm talking about here, but I'm looking forward to understanding it later.

Learning process, but also involves the process often used together with the queue Queue and threading threading, have time to write it later, I hope to help you learn, but also hope that you support me more.