Trying out multi-threaded programming in Python

Multitasking can be done by multiple processes or by multiple threads within a process.

We mentioned earlier that processes are made up of a number of threads, and that a process has at least one thread.

Since threads are units of execution directly supported by the operating system, high-level languages usually have built-in support for multithreading, and Python is no exception; moreover, Python's threads are real Posix Threads, not simulated ones.

Python's standard library provides two modules: thread and threading, with thread being the low-level module and threading being the high-level module that encapsulates thread. In the vast majority of cases, we'll only need to use the high-level module, threading.

Starting a thread is a matter of passing in a function and creating a Thread instance, then calling start() to begin execution:

import time, threading

# Code executed in the new thread.
def loop():
  print 'thread %s is running...' % threading.current_thread().name
  n = 0
  while n < 5:
    n = n + 1
    print 'thread %s >>> %s' % (threading.current_thread().name, n)
    (1)
  print 'thread %s ended.' % threading.current_thread().name

print 'thread %s is running...' % threading.current_thread().name
t = (target=loop, name='LoopThread')
()
()
print 'thread %s ended.' % threading.current_thread().name

The results of the implementation are as follows:

thread MainThread is running...
thread LoopThread is running...
thread LoopThread >>> 1
thread LoopThread >>> 2
thread LoopThread >>> 3
thread LoopThread >>> 4
thread LoopThread >>> 5
thread LoopThread ended.
thread MainThread ended.

Since any process starts a thread by default, we call that thread the main thread, which in turn can start new threads, Python's threading module has a current_thread() function, which always returns an instance of the current thread. The name of the main thread instance is MainThread, and the names of the subthreads are specified at creation time; we name the subthreads with LoopThread. The names are only used for display at print time and have no other meaning at all, if you don't name them Python automatically names the threads Thread-1, Thread-2 ......
Lock

The biggest difference between multi-threading and multi-processing is that in multi-processing, each has a copy of the same variable that exists in each process and does not affect each other, whereas in multi-threading, all variables are shared among all threads, so any one variable can be modified by any one thread, and therefore the biggest danger of sharing data among threads is that more than one thread can change a variable at the same time and mess up the contents.

to see how multiple threads manipulating a variable at the same time can mess up the contents:

import time, threading

 
# Assuming that this is your bank balance: #
balance = 0

def change_it(n):
  # Deposit first, withdraw later, the result should be 0: #
  global balance
  balance = balance + n
  balance = balance - n

def run_thread(n):
  for i in range(100000):
    change_it(n)

t1 = (target=run_thread, args=(5,))
t2 = (target=run_thread, args=(8,))
()
()
()
()
print balance

We define a shared variable balance with an initial value of 0 and start two threads, save first and fetch later, theoretically the result should be 0. However, since the scheduling of the threads is determined by the operating system, when t1 and t2 are executed alternately, as long as the number of loops is sufficiently high, the result of balance is not necessarily 0.

The reason for this is because a statement in a high-level language is several statements when executed by the CPU, even for a simple calculation:

balance = balance + n

There are also two steps:

Calculate balance + n and store it in a temporary variable;
Assigns the value of the temporary variable to balance.

That is, it can be seen as:

x = balance + n
balance = x

Since x is a local variable, each of the two threads has its own x when the code executes normally:

Initial value balance = 0

t1: x1 = balance + 5 # x1 = 0 + 5 = 5
t1: balance = x1   # balance = 5
t1: x1 = balance - 5 # x1 = 5 - 5 = 0
t1: balance = x1   # balance = 0

t2: x2 = balance + 8 # x2 = 0 + 8 = 8
t2: balance = x2   # balance = 8
t2: x2 = balance - 8 # x2 = 8 - 8 = 0
t2: balance = x2   # balance = 0

Result balance = 0

But t1 and t2 run alternately, if the operating system executes t1, t2 in the following order:

Initial value balance = 0

t1: x1 = balance + 5 # x1 = 0 + 5 = 5

t2: x2 = balance + 8 # x2 = 0 + 8 = 8
t2: balance = x2   # balance = 8

t1: balance = x1   # balance = 5
t1: x1 = balance - 5 # x1 = 5 - 5 = 0
t1: balance = x1   # balance = 0

t2: x2 = balance - 5 # x2 = 0 - 5 = -5
t2: balance = x2   # balance = -5

Result balance = -5

The reason for this is because modifying the BALANCE requires multiple statements, and the threads may be interrupted while executing these statements, thus causing multiple threads to change the contents of the same object out of order.

Two threads at the same time a deposit and withdrawal, it may lead to the balance is not correct, you certainly do not want your bank balance somehow become negative, so we must ensure that a thread in the modification of the BALANCE, the other thread must not change.

If we want to make sure that the balance is calculated correctly, we have to put a lock on change_it(), and when a thread starts to execute change_it(), we say that the thread, because it has acquired the lock, so other threads can not execute change_it() at the same time, and they can only wait until the lock has been released, and can not be changed until after acquiring that lock. Since there is only one lock, no matter how many threads, at most one thread holds the lock at the same moment, so there is no conflict to change. Creating a lock is accomplished through ():

balance = 0
lock = ()

def run_thread(n):
  for i in range(100000):
    # To acquire the lock first: #
    ()
    try:
      # Change without fear: #
      change_it(n)
    finally:
      # Be sure to release the lock when you're done: #
      ()

When multiple threads execute () at the same time, only one thread succeeds in acquiring the lock and then continues to execute the code, while the other threads continue to wait until the lock is acquired.

The thread that acquires the lock must release it when it is used up, otherwise those threads that are waiting for the lock will wait forever and become dead threads. So we use try.... . finally to make sure that the lock will be released.

The good thing about locks is that they ensure that a key piece of code can only be executed from start to finish by a thread, the bad thing is of course also many, first of all, it prevents the concurrent execution of multiple threads, including locks of a piece of code can only be executed in a single-threaded mode, the efficiency of which is greatly reduced. Second, because there can be multiple locks, different threads hold different locks, and try to acquire the lock held by the other side, it may cause deadlock, resulting in multiple threads all hang, can neither be executed nor ended, only by the operating system to force termination.
multi-core CPU

If you are unfortunate enough to have a multi-core CPU, you must be thinking that multiple cores should be able to execute multiple threads simultaneously.

What happens if you write a dead loop?

Open Activity Monitor for Mac OS X or Task Manager for Windows to monitor the CPU utilization of a process.

We can monitor that a dead thread takes up 100% of a CPU.

If there are two dead threads, in a multi-core CPU, it can be monitored that it will take up 200% of the CPU, that is, it will take up two CPU cores.

To run all the cores of an N-core CPU, you have to start N dead threads.

Try writing a dead loop in Python:

import threading, multiprocessing

def loop():
  x = 0
  while True:
    x = x ^ 1

for i in range(multiprocessing.cpu_count()):
  t = (target=loop)
  ()

Starting the same number of N threads as the number of CPU cores, on a 4-core CPU you can monitor a CPU usage of only 160%, that is, using less than two cores.

Even with 100 threads started, the utilization is only about 170%, still less than two cores.

But rewriting the same dead loop in C, C++ or Java, it's straightforward to run all the cores full, up to 400% on 4 cores and 800% on 8 cores, so why can't Python?

Because Python's threads are real threads, but the interpreter executes the code with a GIL lock: Global Interpreter Lock. Before any Python thread executes, it must first obtain the GIL lock, and then, for every 100 byte-codes executed, the interpreter automatically releases the GIL lock to give other threads a chance to execute. This GIL global lock actually puts all the thread execution code on the lock, so multi-threading in Python can only be executed alternately, even if 100 threads running on 100-core CPU, can only be used to 1 core.

GIL is a historical legacy of Python's interpreter design, and usually the interpreter we use is the official implementation of CPython, to really take advantage of multicore unless we rewrite an interpreter without GIL.

So, in Python, it is possible to use multithreading, but don't expect to be able to effectively utilize multiple cores. If you must utilize multiple cores through multithreading, it can only be done through C extensions, but that loses the simplicity of Python.

However, there is no need to worry too much, Python can implement multicore tasks through multiple processes although it cannot utilize multiple threads. Multiple Python processes have their own independent GIL locks and do not affect each other.
wrap-up

Multi-threaded programming, with its complex model, is prone to conflict and must be isolated by locks, while, at the same time, being careful about deadlocks.

The Python interpreter, due to its design with GIL global locks, makes it impossible to utilize multiple cores for multiple threads. Multithreaded concurrency is a beautiful dream in Python.