SoFunction
Updated on 2025-04-11

A brief discussion on Golang's Work Stealing mechanism

Go's runtime system uses aWork Stealing(Work Stealing) scheduling policy to allocate Goroutine to available threads (calledM, i.e., execute on Machine. This can maximize CPU usage and reduce the overhead of task scheduling. Under this mechanism, task queues and schedulers improve concurrency performance and throughput by dynamically balancing the load.

Go's scheduler uses P (Processor) to interact with M and Goroutine. Each P maintains a local Goroutine queue, and the newly created Goroutine will first be placed in the local queue where the P created it. In this system, P can be regarded as the number of scheduleable Goroutines, and each P can be associated with an M to execute the Goroutine.

The core idea of ​​the Work Stealing mechanism: Each operating system thread (M) has a local task queue, which will execute coroutines in its own queue as first as possible. When the P queue of a certain M is empty and other Ps still have tasks, the M will try to "steal" some coroutines from other Ps to execute to achieve load balancing

Basic working principle

  • Task Queue: Each worker thread (or goroutine) has its own double-ended queue (deque) for storing tasks. When a thread generates a new task, it puts the task into its own queue. This queue is the P processor mentioned above.

  • Perform tasks:M first obtains the task from its corresponding P and executes it. If its task queue is empty, it will try to steal the task from the task queue P of other threads.

  • Stealing missions: When an M finds that its task queue P is empty, it will randomly select the task queue P of other M to steal the task from the other end of the queue. This avoids competition because threads use one end on their task queue, while other threads can only steal tasks from the other end.

  • Load balancing: Through this mechanism, the system can dynamically balance the load. If a thread has more tasks, other idle threads can help handle these tasks, thus avoiding some threads being overloaded while others being idle.

  • Global Queue: If all local queues P are empty, the scheduler will get tasks from the global queue, which stores goroutines that all P cannot handle.

Here is a simplified diagram showing the interaction between P, M and Goroutine:

     P1       P2       P3
     |        |        |
     v        v        v
    [G1,G2] [G3]   [G4,G5,G6] 
     ^        ^        ^
     |        |        |
     M1       M2       M3

In this figure, we have 3 Ps (P1, P2, and P3), each P has a local Goroutine queue. M1, M2 and M3 are 3 threads, each thread is associated with a P, and the Goroutine is taken out of its queue for execution. When M1 completes G1, it will take G2 out of P1's queue to execute. If P1's queue is empty, M1 will try to "steal" a Goroutine from the queue of P2 or P3.

When the local queue, global G queue, and Netpoller cannot find the executable G from this thread M, G will be stolen from other P and placed on the current P

1) If the global queue has G, the number of Gs stolen from the global queue: N = min(len(GRQ)/GOMAXPROCS + 1, len(GRQ/2)) (Load balancing according to the number of GOMAXPROCS)

2) If Netpoller has G (G where network IO is blocked), the number of Gs stolen from Netpoller: N = 1

3) If G is stolen from other P, the number of Gs stolen from other Ps: N = len(LRQ)/2 (Balanced Load Balancing)

4) If you try many times and cannot find the goroutine that needs to be run, it will go to sleep and wait for it to be woken up by other worker threads.

See the runtime/stealWork function for stealing G from other Ps. The stealing process is as follows:

1) Select the P you want to steal

2) Steal half of G from P

Select the P to steal

The essence of stealing is to traverse all Ps and see if its running queue has a goroutine. If so, take half of it to the running queue of the current worker thread.

In order to ensure fairness, when traversing P, we do not access P in the order of array subscripts, but instead use a pseudo-random way to traverse each P in allp to prevent the same order of accessing elements in allp in each traversal.

offset := uint32(random()) % nprocs
coprime := Randomly select a less thannprocsAnd withnprocsThe number of mutually equitable
const stealTries = 4 // Try up to 4 timesfor i := 0; i < stealTries; i++ {
  // Random access to all P    for i := 0; i < nprocs; i++ {
        p := allp[offset]
        frompRun queue stealgoroutine
        if Stealing successfully {
        break
        }
        offset += coprime
        offset = offset % nprocs
     }
}

You can see that as long as the random numbers are different, the order of traversing P is different, but it can be guaranteed that after nprocs loops, each P will be accessed.

Steal half of G from P

After selecting the stolen object P, call runtime/ function runqsteal to steal the goroutine in the running queue of P. The runqsteal function then calls runqgrap to steal half of G in batches from the tail of P's local queue.

func runqgrab(_p_ *p, batch *[256]guintptr, batchHead uint32, stealRunNextG bool) uint32 {
    for {
        h := (&_p_.runqhead) // load-acquire, synchronize with other consumers
        t := (&_p_.runqtail) // load-acquire, synchronize with the producer
        n := t - h        // Calculate how many goroutines are in the queue        n = n - n/2     //Get half of the number of goroutines in the queue        if n == 0 {
            ......
            return ......
        }
        return n
    }
}

advantage

  • Efficient use of resources: By dynamically balancing the load, ensure that all threads remain as busy as possible and improve CPU utilization.

  • Reduce competition: When stealing tasks, only access one end of other thread queues is accessed, reducing competition and lock usage.

  • flexibility: Can adapt to load changes and automatically perform load balancing when the task volume is uneven.

This is all about this article about Golang's Work Stealing mechanism. For more information about Golang's Work Stealing mechanism, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!